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Genetics without borders 


A UK government scheme to establish nationality through DNA testing is scientifically flawed, 


ethically dubious and potentially damaging to science. 


understood only in terms of superficial characteristics, such as 

hair and skin colour. Today, thanks to the advent of cheap, fast 
genetic sequencing and DNA-microarray technologies, population 
geneticists can chart such variations in a more systematic way. Yet 
most experts agree that these studies are still in their infancy. 

So it was with understandable incredulity that researchers received 
a plan by the UK Border Agency to use genetics to determine nation- 
ality — specifically, the origin of asylum-seekers claiming to be from 
war-torn Somalia. The agency’s pilot programme, which began last 
month, aims to determine whether some 100 individuals really are 
Somali nationals by checking them for the individual DNA variants 
known as single nucleotide polymorphisms (SNPs) in mitochon- 
drial DNA, on the Y chromosome and elsewhere in the genome. The 
scheme will also use isotopic ratios of elements found in hair and 
fingernails — which can vary depending on a person's diet or envi- 
ronment — to try to establish where the migrants previously lived. 

The border agency says that the project has undergone scientific 
peer review, although it is difficult to say by whom: several geneticists 
contacted by Nature saw a preliminary proposal from the UK govern- 
ment in 2007, and warned that it was unlikely to work. 

It is true that the recent development of large SNP databases have 
made it possible to determine the geographic origins of Europeans to 
within a few hundred kilometres (see Nature 456, 98-101; 2008). But 
comparable data on many human populations, especially in regions 
such as Africa, remain patchy at best, and it is unclear what data the 
border agency will use to establish the origins of these particular 
asylum-seekers. 

Ona more fundamental level, the idea that genetic variability follows 
man-made national boundaries is absurd. Cross-border migration 


| | ntil a few years ago, the genetic variation of humans was 


is common throughout the world; Y-chromosome analysis can easily 
be thrown off by a distant male ancestor; and SNP-based identifi- 
cations are inexact to say the least. As an example of this last point, 
individuals whose parents come from two geographic regions are often 
classed into a third region from which neither parent originated. 

The use of isotopic analysis for identify- _,, ; 
ing nationality is also unproven. Although The idea that . 
it may be possible to use isotopic ratios to genetic variability 
determine the region in whichapersonhas_ follows national 
recently lived, it cannot provide definitive }gyndaries is 
evidence of their origins. absurd.” 

These problems seem to be ignored in 
the guidelines provided to border agents testing the asylum-seekers. 
Given the scientific credibility of DNA evidence, it is not difficult 
to imagine that these agents — who are presumably not geneticists 
— might place undue weight on results that are, at best, difficult to 
interpret and, at worst, spurious. 

Migration organizations and geneticists alike have been vocal 
in their protests against the plan, and in response the UK govern- 
ment seems to have backpedalled. In a statement released earlier 
this week from the Home Office, which runs the border agency, 
the programme was described as only a proof-of-concept project 
that would not be used to make decisions about any asylum-seeker. 
But the government should cancel this scientifically dubious and 
politically sensitive programme outright. Ifit is allowed to continue, 
it could easily lead to a public backlash in the very populations that 
geneticists need to study to understand human origins and the 
genetic underpinnings of disease. Geneticists, and indeed all scien- 
tists, should decry the plan and make it clear that the science does 
not support it. a: 


Putting DNA to the test 


Genetic-testing companies lack regulation, anda 
list of guiding principles does not go far enough. 


has mushroomed, leaving regulation lagging behind. Dozens 

of companies now offer inexpensive home kits that allow peo- 
ple to spit into tubes, send the samples for DNA analysis and receive 
a report that allegedly details their ancestry or their possible sus- 
ceptibility to a long list of disorders that have been linked — often 
tenuously — to particular genes. But the value of these tests remains 
debatable, which is why the industry needs a strong set of quality 
standards and codes of conduct to protect both its consumers and 
its own credibility. 


Ts availability of affordable, direct-to-consumer genetic tests 


The UK Human Genetics Commission (HGC) took a welcome 
step in that direction last month when it issued a set of principles to 
help guide consumers and to promote high standards and consistency 
among personal-genomics providers. But the HGC’s guiding princi- 
ples — which are under public review until early December — focus 
largely on reining in companies’ promotional messages so that they 
reflect the limited utility of genetic testing, and to make would-be 
customers more aware of what they can realistically expect to learn 
from the tests. Most DNA testing companies say they are already 
doing just that, emphasizing that what they provide is information, 
not medical diagnoses. 

The question is what happens if or when prices drop further and 
the tests become more popular. They are already being marketed over 
the Internet with little oversight, and it seems likely that increasing 
numbers of people will be turning to personal-genomics companies 
in search of definitive answers about how to improve or safeguard 
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their health. But the available answers are rapidly becoming less 
definitive: the ever-increasing number of genome-wide association 
studies, which provide a major portion of the genetic markers for 
disease risks, offer results that even researchers can find hard to inter- 
pret, and often flag up markers that are not the most useful predictors 
of complex traits (see page 712). 

This means that customers will frequently receive results telling 
them only that they face the ambiguous possibility of a somewhat 
elevated risk of a little-understood disorder. Presumably, most 
customers in that position will simply be more vigilant about disease 
screening. But if the ambiguous, slightly elevated risk relates to a 
frightening condition such as breast cancer, some individuals might 
feel compelled to undertake drastic and perhaps needless measures, 
such as prophylactic mastectomy. 

The HGC’s principles, if adopted, should help to minimize such 
panic reactions. For example, the HGC advocates that genetic coun- 
selling be provided both before and after testing for serious hereditary 
diseases. But there is room to go further and, on page 724, Craig 


Venter and his colleagues offer nine recommendations for how to 
do so. For example, Venter’s team urges companies to agree ona core 
set of non-ambiguous genetic markers — ones that put a carrier at 
high risk of developing a specific condition. Companies would be 
well-advised to follow this recommendation, as such an agreement 
would help to avoid conflicting messages and inconsistent results 
across the industry. 

Ultimately, however, government regulators may feel compelled 
to step in on the grounds that industry self-policing no longer offers 
consumers adequate protection. The US Food and Drug Adminis- 
tration has already recruited bioethicist Alta Charo, in part to advise 
commissioner Peggy Hamburg on a comprehensive approach to 
regulating these tests. 

Government regulators should proceed with care, given the 
dizzying speed at which the science of personalized genomics is 
advancing. But in the interim, DNA-test providers should up their 
game by providing only clinically useful information and spelling out 
exactly how much biology remains unknown. a 


How to win trust over flu 


Mass-vaccination campaigns for the pandemic 
H1N1 virus must take public concerns into account. 


against pandemic H1N1 flu, a poll released last week by the 

Harvard School of Public Health in Boston, Massachusetts, 
found that only four in ten US adults have definitely decided to get 
vaccinated themselves, and just half plan to get the shot for their 
children (go.nature.com/wiB8V3). Harvard’s results parallel those 
from other surveys, both inside and outside the United States, all of 
which suggest that many people are still dubious about the vaccine. 
Public-health authorities, who are keen to contain the pandemic’s 
spread, need to realize that their best hope of dealing with such public 
reluctance is to patiently address the concerns that underlie it. 

Sometimes, it’s true, those concerns go beyond any appeal to reason. 
They grow out of a visceral mistrust of authority in general — and 
of government, regulatory agencies, medical researchers and multi- 
national pharmaceutical companies, in particular. A sophisticated 
anti-vaccine movement has emerged that plays on this wariness, and 
helps to feed the conspiracy theories about the H1N1 vaccine that are 
circulating on the Internet and in viral e-mails. 

But far more often, say researchers who have studied this subject, 
people are assessing vaccination through a perfectly rational cost- 
benefit analysis. There is a widespread public perception, for example, 
that the vaccine’s safety trials have been rushed — the Harvard study 
found that possible side effects were respondents main concern — and 
that H1N1 flu is mild. As a result, many feel no urgent need to be vac- 
cinated, preferring to hold off until they see how the first phases of the 
vaccination programme go. Indeed, the Harvard poll also found that 
some 60% of those who dont intend to get a shot are open to changing 
their mind if people in their community become severely ill or die. 

Such deliberations reflect a perfectly legitimate decision-making 


é s countries roll out their campaigns for large-scale vaccination 
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process, says Peter Sandman, a risk-communication consultant in 
Princeton, New Jersey. And governments, he advises, should frame 
their public-education campaigns in ways that respect people's judge- 
ment and their wait-and-see attitude. Research in risk communication 
strongly shows that when over-eager officials pressure members of the 
public who are already sceptical and ambivalent, while being openly 
dismissive of public concerns, they only end up stoking resistance. 

Instead, officials should focus on providing people with the 
information they need to make good choices for themselves. This 
should include reminders that coincidences do happen: in any mass- 
vaccination campaign, at least a few people will fall ill immediately 
after receiving their shot for reasons that have nothing to do with the 
vaccine — a possibility vividly highlighted last week by the death ofa 
14-year-old British girl hours after receiving a vaccine against human 
papilloma virus. Regulatory authorities need to better explain the 
extensive safety tests that vaccines undergo and, at the same time, 
build confidence by being utterly transparent in the reporting and 
investigation of any suspect events linked to vaccination. 

The public-education campaign should also correct the 
misconception that H1N1 flu is mild. It is mild in most who catch it. 
But for those individuals — mainly young adults — who will develop 
the severe form, H1N1 is life-threatening. Moreover, because the 
virus is new and immunity is lacking, many more people will get 
it than is typical for seasonal flu, and the toll of serious illness and 
deaths will accordingly be that much higher. 

Finally, people should be reminded that vaccination isn’t just about 
protecting themselves; it’s also about not spreading the flu to oth- 
ers, which, importantly, alleviates pressure on overstretched hospi- 
tals. Campaigns should give altruistic appeals far more prominence 
than they typically have in the past; research shows that they can be 
surprisingly effective. 

More generally, for officials and researchers at all levels, the scepti- 
cism over the pandemic vaccine should serve as a timely reminder of 
the imperative to work to gain greater public trust in science-based 
advice and in those who give it. 2 
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RESEARCH HIGHLIGHTS 


CANCER BIOLOGY 


Stem cell-cancer link 


Nature Genet. doi:10.1038/ng.465 (2009) 

The SOX2 gene, famous for its role in helping 
to reprogram adult cells into stem cells, is also 
a cancer driver. 

Matthew Meyerson of the Dana-Farber 
Cancer Institute in Boston, Massachusetts, 
and his colleagues searched genome-wide for 
tumour-promoting genes in human samples 
of lung and oesophageal squamous-cell 
carcinomas. They found that a region around 
SOX2 was frequently replicated in both 
diseases. SOX2 expression is necessary for the 
growth of lung and oesophageal squamous- 
cell cancer lines. Overactivating SOX2 also 
turned normal cells cancerous with help from 
a couple of other genes. 


ECOLOGY 


Wildebeest chain reaction 


PLoS Biol. 7,€1000210 (2009) 

One change in an ecosystem can have 
far-reaching effects. This is evident in the 
Serengeti in East Africa, where tree density 
has increased since the 1960s, when the 
rinderpest virus, which attacks wildebeest, 
was eradicated. To figure out what the 
connection between these events might be, 
Ricardo Holdo of the University of Florida 
in Gainesville and his colleagues compared 
ten models of tree and fire dynamics on the 
famous savannah. 

The researchers conclude that after the 
disease was wiped out, wildebeest grew in 
number and ate more grasses. With less grass 
to burn, fires decreased in frequency and 
more seedlings were able to grow to maturity. 
Other factors such as climate change and 
browsing by elephants seemed to have less ofa 
role. The team adds that this shift means that 
the Serengeti may have become a carbon sink. 
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Boys against girls 


Science doi:10.1126/ 
science.1174705 (2009) 
Male cichlid fishes in East 
Africa's Lake Malawi have 
evolved striking coloration 
(top right) to compete for 
females. The objects of their 
affections, meanwhile, tend 
to sport inconspicuous brown 
scales (top left). 

An exception is the 


GEOSCIENCE 


Earth's magnetic personality 


Geochem. Geophys. Geosys. 
doi:10.1029/2009GC002496 (2009) 
When did our planet develop the roiling 
convection pattern that churns its metallic 
core and gives rise to its magnetic field? 
Ancient rocks from South Africa's 
Barberton greenstone belt reveal that such 
currents in the core must have started by 
3.45 billion years ago, significantly earlier 
than had been established from previous 
rock evidence, report John Tarduno of the 
University of Rochester in New York and 
his colleagues. The rocks sport a magnetic 
signature indicating that the 
planet had developed a substantial 
magnetic field by that time. 
Past work by Tarduno and 
his co-workers had provided 
evidence that a significant 
magnetic field was present by 3.2 
billion years ago. 


ANALYTICAL CHEMISTRY 
Gloop monitor 


Angew. Chem. Int. Edn doi:10.1002/ 
anie.200902360 (2009) 

Mass spectrometry is an 
invaluable tool for analysing 
substances’ molecular 
compositions. But using it on 
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‘orange-blotch’ trait, which 
is found almost exclusively 
in females (bottom left) 
and provides them with 
camouflage. When it does 
occur in males, it disrupts 
their patterning (bottom 
right), reducing their fitness. 
Thomas Kocher and his 
colleagues at the University 
of Maryland in College Park 
have found that this trait is 
caused by a mutation in the 


R. ROBERTS, UNIV. MARYLAND 


Pax7 gene, which is tightly 
linked to a new female-sex- 
determining gene. The close 
linkage between the mutated 
gene and the female sex 
determiner ensures that 
orange-blotch is expressed 
mainly in females. Such 
sexual conflicts can lead to 
the evolution of new sex- 
determining systems and 
many other traits, the team 
suggests. 


viscous liquids such as toothpaste has 
required the time-consuming step of taking 
selective extracts from the liquids. Now these 
sticky complex mixtures can be monitored 
directly in a flask. 

Renato Zenobi at the Swiss Federal 
Institute of Technology in Zurich and 
Huanwen Chen at the East China Institute of 
Technology in Fuzhou and their colleagues 
blew nitrogen gas through samples of 
toothpaste, honey and olive oil to create 
bubbles that carry molecules up to the 
sample's surface. There, the bubbles burst, 
creating aerosols that can be analysed ina 
standard mass spectrometer. 

The team used the technique to track the 
progress of chemical reactions in viscous 
ionic liquids, which are increasingly popular 
solvents. 


AGEING 


Live longer, but how? 


Science 326, 140-144 (2009) 
Caloric restriction extends the lifespan of 
many model organisms, a finding that has 
prompted some people to drastically reduce 
their food intake in the hope of upping 
their longevity. But how exactly caloric 
restriction staves off death remains 
unknown. 

Dominic Withers of University College 
London and his colleagues found that mice 
in which the gene S6k1 was deleted lived 


RESEARCH HIGHLIGHTS 


MICROBIOLOGY 


Bacteria fight back 


J. Exp. Med. doi:10.1084/jem.20090097 (2009) 
A bacterium that causes many skin and 
bloodborne infections and the bacterium 
responsible for anthrax both synthesize 
the same molecule to evade host immune 
responses. 

Olaf Schneewind and his colleagues at the 
University of Chicago in Illinois examined 
the ability of Staphylococcus aureus (pictured 
below) to survive in rodent blood. They found 
that an enzyme anchored to the bacterium’s 
cell wall produces adenosine, a key signalling 
molecule, during infection to protect the 


for about 80 days — or 9% — longer than 
control mice, with females surviving 20% 
longer. The animals were also less likely to 
develop certain signs of ageing, such as loss 
of insulin sensitivity. The gene-expression 
patterns of the mice were similar to those 
in mice undergoing long-term caloric 
restriction, suggesting that manipulating 
S6K1 signalling could be a strategy for 
researchers seeking drugs that mimic the 
positive effects of this regime. 


due to the heat supplied or to some effect of 
microwaves electromagnetic field on the 
reaction components. 

Oliver Kappe and his colleagues at Karl 
Franzens University in Graz, Austria, have 
separated the two effects using silicon carbide 
vials, which transmit the heat but block out 
their electromagnetic field. 

When they measured reaction time and 
product yield in 18 microwave-enhanced 
reactions, the researchers obtained almost 
identical results with silicon carbide vials as 
with Pyrex containers. This suggests that in 
most cases heat is responsible for the benefits 
of microwave chemistry. 


STEM-CELL BIOLOGY 


Rebooting cord blood cells 
Cell Stem Cell 5, 434-441; 353-357 (2009) 


A well-known cocktail of genes can reset bacteria from attack by white blood cells. The | MATERIALS SCIENCE 
many adult cells to ‘pluripotency;, a state anthrax pathogen Bacillus anthracis uses the 
from which they can develop into almost same mechanism for survival. No gas from glass 


any tissue. Now, two groups have 
derived induced pluripotent stem 
(iPS) cell lines from umbilical 
cord blood, a source that could be 
clinically useful. 

Ulrich Martin of Hannover 
Medical School in Germany and 
his colleagues created the cells 
from cord blood using four genes, 


OCT4, SOX2, NANOG and LIN28. 


Juan Carlos Izpistia Belmonte at 
the Salk Institute for Biological 
Studies in La Jolla, California, and 
his collaborators generated iPS 


cells using as few as two genes, OCT4 and 


SOX2. 


Cord-blood cells have not acquired as 
many mutations as other cells, so stem 
cells such as these might be less prone to 


turn cancerous if used as therapy. 


Nevertheless, both groups used viruses to 
insert the genes, reducing the cells’ direct 


therapeutic utility. 


JOURNAL CLUB 


Judith E. Mank 

Edward Grey Institute, 
Department of Zoology, 
University of Oxford, UK 


An evolutionary biologist 
compares genomic complexity 
to modern art. 


Like many students of evolutionary 
biology, | was taught that genes 
encode physical traits, or 
‘phenotypes’, that are the focus 

of natural selection — a model 
with clear, direct links and few, 

if any, complications. Over the 
past few years, | have found it 


CHEMISTRY 
Microwave magic 


(2009) 


Microwave irradiation is commonly used 
to boost the speed and yield of chemical 
reactions, but how it works has been 
unclear. Debate centres on whether it is 


increasingly difficult to reconcile 
this simple model connecting 
genes and the organisms they 
encode with the burgeoning data 
of systems biology, which show 
the genome as a heaving tangle 
of interconnections. Given the 
complexity of the genome, how can 
selection target any single gene 
without unintended consequences? 
Trudy Mackay at North Carolina 
State University in Raleigh and 
her collaborators have begun to 
resolve the opposing genomic 
and evolutionary world views by 
examining the systems genetics 
that underlie phenotypes in the 
fruitfly Drosophila melanogaster 


Angew. Chem. Int. Edn doi:10.1002/anie.200904185 


Nature Mater. doi:10.1038/nmat2542 
(2009) 

Where temporary surgical 
implants are concerned, materials 
that decompose safely over time 
eliminate the need for costly and 
painful removal. Magnesium in its 
crystalline form has been used in 
some devices because it is about 
as strong as bone. But when it 
corrodes, it releases hydrogen gas, 


raising the risk of gas pockets being 


formed in tissues. 
Jorg Loffler and his colleagues 


at the Swiss Federal Institute of Technology 


in Zurich sharply reduced gas release by 
creating glassy magnesium zinc alloys. 
The gas reduction — which took place only 


if the alloy’s zinc content was at least 28% 


(J. F. Ayroles et al. Nature Genet. 41, 
299-307; 2009). They do this by 
comparing data on the abundance 
of more than10,000 DNA 
transcripts with whole-organism 
traits, such as fitness and lifespan, 
in 40 fruitfly lines. 

The researchers show that 
aggregates of genes correlate with 
distinct characteristics in flies, and 
that these modules are connected, 
with groups of genes associated 
with multiple phenotypic traits. 
This elegant complexity is best 
conveyed by the figures in the 
paper, some of which look as 
though they were lifted off the 
walls of a modern-art gallery. 
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— happened because a dense outer layer 

of zinc oxide or carbonate forms when the 
magnesium in these alloys decomposes, 
preventing hydrogen from forming bubbles. 


The group's work provides 
a post-genomic framework 
for dissecting the intricate 
underpinnings of organismal 
biology. More importantly, the 
paper demonstrates that key topics 
in traditional evolutionary studies, 
such as heritability, and more 
recent concepts, such as pleiotropy 
(whereby one gene affects multiple 
traits), are related. As such, they 
must be considered together to 
build a complete understanding 
of how selection acts through the 
phenotype to sculpt the genome. 


Discuss this paper at http://blogs. 
nature.com/nature/journalclub 
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@ POLICY 


Climate law: Democrats in the 
US Senate unveiled their climate 
legislation on 30 September. 
Based largely on the Waxman- 
Markey bill passed by the House 
of Representatives in June, 

it proposes a cap-and-trade 
system to reduce greenhouse- 
gas emissions by 20% by 2020 
and 83% by 2050, compared 
with 2005 levels. An initial 
committee vote could come 
later this month. On the same 
day, the US Environmental 
Protection Agency proposed a 
rule that would require major 
industrial facilities to use “best 
available” technologies to reduce 
greenhouse-gas emissions. See 
go.nature.com/RWjAdj for more. 


Frozen grants: A ¥270-billion 
(US$3-billion) funding 
programme in Japan has 

been put on hold because of a 
wholesale budget freeze by the 
country’s new government. The 
Funding Program for World- 
Leading Innovative R&D on 
Science and Technology was 
created this spring by Japan’s 
former ruling party, the Liberal 
Democratic Party, as part ofa 
supplementary budget. Thirty 
research groups were quickly 
chosen to share the money ahead 
of the 16 September transfer of 
power to the Democratic Party of 
Japan, but none is yet assured of 
the promised funds. 


Chemical regulation: The 

US Environmental Protection 
Agency (EPA) laid out White 
House-backed principles for a 
radical reform of US legislation 
regulating toxic chemicals, 

at present controlled by the 
1976 Toxic Substances Control 
Act. EPA administrator Lisa 
Jackson said the act had proved 
to be “an inadequate tool” for 
protecting the public. She wants 
to strengthen the EPA’s authority 
to clamp down on dangerous 
chemicals, and for chemical 
manufacturers routinely to give 
the agency toxicity data. The 
American Chemistry Council, 
which represents US chemical 
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AUSTRALIA 


ISLANDS SUFFER DUAL SHOCKS 


Major earthquakes in the Pacific and Indian oceans struck Samoa and Indonesia last week. An earthquake 
measured by the US Geological Survey as a magnitude 8.0 triggered a tsunami off Tonga and Samoa that killed at 
least 190 people and left thousands homeless. The death toll for the magnitude-7.6 earthquake that hit just hours 
later off the Indonesian island of Sumatra may reach thousands. Experts said that the two earthquakes were not 
related to each other. See go.nature.com/qkxBhD for more. 


manufacturers, says it welcomes 
the reform. A congressional bill 
is expected soon. 


Biosecurity: The US 
government should grade 
microorganisms and toxins 
according to their risk as 
potential “biothreat” agents, 
and regulate them accordingly. 
That was the recommendation 
of a National Research Council 
report released last week, 
entitled Responsible Research 
with Biological Select Agents and 
Toxins. Currently, research on 
82 human, plant and animal 
pathogens (called select agents) 
is monitored under a 1996 law 
that requires the same security 
procedures for all of them. The 
report also called for funding of 
regular, independent evaluation 
of the programme governing 
research into select agents. 


European research: Control of 
Europe’ research funds should 
be devolved from the European 
Commission to agencies with 
“arms-length” independence, an 


NUMBER 
CRUNCH 


12,000% 


The potential rise in 
India's nuclear capacity, 
from 3.8 gigawatts 
today to 470 gigawatts 
by 2050, in expansion 
plans announced by the 
country's prime minister, 
Manmohan Singh. 

(The Times) 
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advisory board has suggested. In 
areport published on 6 October, 
the European Research Area 
Board also said that research’s 
share of the European Union's 
budget should triple to 12% by 
2030, with half of those funds 
going towards basic research. In 
the most recent funding round, 
research was allocated €50 billion 
(US$74 billion) for 2007-13, of 
which €7.5 billion went to basic 
research. See go.nature.com/ 
yVLwT3 for more. 


@ FACILITIES 


Dam settlement: After a 

bitter and lengthy controversy 
over water management, four 
hydroelectric dams on the 
Klamath River in Oregon and 
California will be removed to 
restore salmon runs. PacifiCorp, 
the Portland-based utility in 
Oregon that owns the dams, 
announced the draft agreement 
on 30 September after almost 

a decade of negotiations with 
federal agencies, farmers, states, 
conservation groups and Native 
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American tribes. Removal of the 


dams will not begin until 2020, 
and is dependent on government 
approval as well as full federal 
and state funding. 


@ BUSINESS 


Firing frenzy: Sequenom, a 
biotechnology firm in San 
Diego, California, has cleared 
out its top executives after an 
internal investigation found 

lax oversight of faulty research. 
The company revealed in April 
that data supporting a prenatal 
screen for Down's syndrome 
were “mishandled” and could not 
be relied on (see Nature 459, 23; 
2009). Last week the company 
fired chief executive Harry Stylli 
and Elizabeth Dragon, senior 
vice-president of research and 
development. Chief financial 
officer Paul Hawran and another 
unnamed executive resigned, 
and three research scientists also 
had their contracts ended. 


@ RESEARCH 


Research statistics: China 

has become the second-largest 
producer of academic research 
papers in the world behind the 
United States, according to a 
study produced annually for the 
UK government. Leeds-based 
analysts Evidence Ltd revised 

its figures to show that China's 
output overtook the major 
European Union states in 2006, 
although on measures of citation 
share and research strength 
(citation share across 10 different 
fields) it remained lower. The 
achievement had not been 

noted until now, and follows an 
increase in the volume of papers 


SOUND 
BITES 


“That is one of the 
things that wakes me 
up in the middle of the 
night.” 


Francis Collins, the newly 
minted director of the US 
National Institutes of Health, 
tells Nature how he feels about 
trying to ensure that the agency 
won't suffer financially when its 
stimulus money is spent. See 
go.nature.com/h15ch6 for the 
full interview. 


indexed by databases. India and 
Brazil show signs of following 
China’s rise in the next decade. 


Crop warning: Developing 
countries could see crop yields 
fall dramatically by 2050 if 
climate change is left unchecked, 
according to a study released on 
30 September by the International 
Food Policy Research Institute 

in Washington DC. The report 
forecasts that wheat yields from 
irrigated fields might fall bya 
third or more by 2050, compared. 
to a no-climate-change scenario, 
and that farmers in southern Asia 


might see their wheat production 
almost halve. See go.nature.com/ 
VYzOWI for more. 


Presidential visit: US President 
Barack Obama visited the 
National Institutes of Health 

in Bethesda, Maryland, on 

30 September (pictured). Obama 
toured a National Cancer Institute 
lab, where he was treated to video 
images of healthy and cancer- 
riddled brains, and praised the 
research made possible by the 
agency's spending of $5 billion 
out of the $10.4 billion it got in 
economic stimulus funds. The 
agency had raced to disburse the 
money by that same day, the end 
of the government’ fiscal year. 


v 


Carbon cuts: Carbon dioxide 


emissions could fall by 3% 
worldwide this year because of 
the global economic crisis, the 
International Energy Agency 
predicted ina teaser from 

its upcoming World Energy 
Outlook 2009 report. The excerpt 
was released on 6 October to 
coincide with United Nations 
climate talks in Bangkok. 


@ AWARDS 

Nobel winners: Elizabeth 
Blackburn, Carol Greider and 
Jack Szostak shared the 2009 


THE WEEK 
AHEAD 


9 OCTOBER 

NASA's Lunar Crater Remote 
Observation and Sensing Satellite 
will crash into a crater near the 
Moon's south pole, in the hope 

of disturbing and detecting ice. 

} http://Icross.arc.nasa.gov 


15-16 OCTOBER 

‘The ambitions of Europe in 
space’ — European policy- 
makers, financiers, space 
scientists and industrial 
representatives converge on 
Brussels to discuss the region's 
space programme. 

> www.spaceconference.eu 


15-25 OCTOBER 

Canada's Perimeter Institute for 
Theoretical Physics (see Nature 
461, 462-465; 2009) hosts — 
and webstreams — the ‘Quantum 
to cosmos’ festival in Waterloo, 
Ontario. 

> www.q2cfestival.com 


Nobel Prize in Physiology or 
Medicine, for their discoveries of 
how chromosomes are protected 
by telomeres and the enzyme 
telomerase. The physics prize 
went to Charles Kao, for his work 
on how light can be transmitted 
through optical glass fibres; and 
to Willard Boyle and George 
Smith for their invention of the 
charge-coupled device (CCD) 
sensor. The chemistry prize was 
yet to be awarded as Nature went 
to press. See page 706 and www. 
nature.com/news for more. 


BUSINESS WATCH 


In September, French company Arkema became 
the latest carbon-nanotube manufacturer this 
year to announce plans for a drastic scaling up 
of production. Despite the materials’ present 
reliance on the mixed fortunes of the automobile 
industry, the market for carbon nanotubes as 
raw materials looks set to grow rapidly. Revenues 
could reach US$500 million by 2015 (see chart), 
predicts Jurron Bradley of analysts Lux Research. 
“In 2009 we have seen a lot of players announce 
major expansions,” Bradley says. Market 
leader CNano, based in Santa Clara, California, 
announced in June that its Beijing facility is now 
producing 500 tonnes of multiwalled nanotubes 
annually; Arkema’s plant in Mont, France, 


should be turning out 400 tonnes per year by 
2011. If companies such as Germany's Bayer 
MaterialScience, headquartered in Leverkusen, 
and Tokyo-based Showa Denko follow through 
with similar plans, global nanotube production 
will have doubled by 2011 from around 

800 tonnes per year at present. 

Many of these raw nanotubes are multiwalled, 
and are used to make light, strong, composite 
materials. But a host of smaller companies — 
such as Nanocyl, based in Sambreville, Belgium 
— process them into intermediate products for 
antistatic coatings, for example, sensors for gas 
detection and electrode material for batteries — 
and, on the horizon, for touch-screen displays. 


THE RISE OF CARBON NANOTUBES 


Global carbon-nanotube revenue 
(raw materials), US$ millions 
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NEWS 


Fossil rewrites early human evolution 


Ethiopian find dates back 4.4 million years. 


A 17-year investigation into a fossilized early 
human skeleton from Ethiopia culminated last 
week with 11 papers published in Science. 

Detailed descriptions of the skeleton, of a 
fairly complete 4.4-million-year-old female, 
show that humans did not evolve from ancient 
knuckle-walking chimpanzees, as has long 
been believed. The new fossils of Ardipithecus 
ramidus — known as ‘Ardi’ — offer the first 
substantial view of the biology of a species 
close to the time of the last common ances- 
tor shared by humans and apes. Like modern 
humans, Ardi could walk upright (see depic- 
tion, right) and didn't use her arms for walking, 
as chimps do. Still, she retains a primitive big 
toe that could grasp a tree like an ape’. 

Previously, the oldest near-complete skele- 
ton of an early human was the 3.2-million- 
year-old Australopithecus afarensis skeleton 
known as Lucy, also from Ethiopia. Because 
Lucy had many traits in common with modern 
humans, she didn’t provide much ofa picture of 
the earlier lineage between apes and humans, 
says Alan Walker, a biological anthropologist 
at Pennsylvania State University in University 
Park. The new A. ramidus “is so much more 
important — and strange’, he says. 

The earliest Ardipithecus, A. kadabba, lived 
around 5.8 million years ago in Ethiopia”. The 
other oldest known hominids are Orrorin tugen- 
ensis, from about 6 million years ago in Kenya’, 
and Sahelanthropus tchadensis, from at least 
6 million years ago in Chad*(see graphic). 

In addition to describing the fossils, the 
Science papers provide details about the geol- 


KNOWN HUMAN ANCESTORS 


Ardipithecus 
Ar. kadabba 


= 


Ar. ramidus 
Ardi skeleton 


sETHIOPIA, KENYA 


ogy and palaeoenvironment of the discov- 
ery site, in the Afar desert 230 kilometres 
northeast of Addis Ababa. The research 
team, known as the Middle Awash 
Project, involves 70 investigators, 47 of 
whom are authors on the papers. 

In 1992, team member Gen Suwa 
found the first specimen of A. ramidus 
near the Ethiopian village of Aramis. 
Within two years, enough fossils 
had been found to produce the first 
article that named and sketchily 
described the animal, from a total 
of 17 fossils”. 

Some researchers have complained how 
long it has taken to publish work about the 
fossils. But Berhane Asfaw, a co-director 
of the Middle Awash Project at the Rift 
Valley Research Service in Addis Ababa, says: 
“We werent interested in how many papers 
we could publish. Our interest was in the full 
chain of information; that produces the power 
of the work” 

From more than 135,000 vertebrate bone or 
tooth pieces, the team identified 110 as being 
from A. ramidus, representing a minimum of 
36 individuals. The fossils come from a sedi- 
ment layer sandwiched between two layers of 
volcanic rock known as tuff — each dated to 
4.4 million years ago, says a team led by Giday 
WoldeGabriel, of Los Alamos National Labora- 
tory in New Mexico. Fossils in the sediments 
include plants, pollen, invertebrates and birds, 
which helped to pinpoint the woodland envi- 
ronment where Ardi lived. 
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Years of field work uncovered Ardi’s 
skull, teeth, arms, hands, pelvis, legs 
and feet — all of which had to be 
painstakingly prepared. Ardi’s skull 
was recovered crushed in more than 
60 pieces that were broken and scat- 
tered about. The bone was poorly 
fossilized — so soft that each piece 
had to be moulded ina silicon rub- 
ber cast then digitized by computed 
tomography scans. 
Ardi’s hands and wrists don’t 
show several distinctive chimp 
characteristics, such as some larger 
bones and a tendon ‘shock absorber’ 
system to withstand bodyweight, says 
team member Owen Lovejoy of Kent 
State University in Ohio. The foot, 
with its big toe sticking out sideways, would 
have allowed Ardi to clamber in trees, walk- 
ing along branches on her palms. And her 
teeth show no tusk-like upper canines, which 
most apes have for weapons or display during 
conflict. “This is a major feature showing that 
Ardi is not in the lineage of modern chimps,” 
Suwa says. | 
Rex Dalton 
1. White, T. D. et al. Science 326, 75-86 (2009). 
2. Haile-Selassie, Y. Nature 412, 178-181 (2001). 
3. Senut, B. et al. C.R. Acad. Sci. Paris Ser. |la332, 137-144 
(2001). 


4. Brunet, M. et al. Nature 418, 145-151 (2002). 
5. White, T. D., Suwa, G. & Asfaw, B. Nature 371, 306-312 


(1994). 
For a longer version of this story, see 
go.nature.com/gSAuY5 
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Winners in triplicate: Carol Greider, Elizabeth Blackburn and Jack Szostak. 


Chromosome protection scoops Nobel 


Prize for physiology or medicine awarded for uncovering role of telomeres. 


Three US scientists have won the Nobel Prize 
in Physiology or Medicine for discovering the 
structure of molecular caps called telomeres 
and working out how they protect chromo- 
somes from degradation. Their discoveries in 
cell biology during the 1980s and 1990s opened 
new avenues of work, in ageing and in cancer 
research, which are still highly active today. 

The prize, announced on 5 October, is shared 
equally between Elizabeth Blackburn at the 
University of California, San Francisco, Carol 
Greider of the Johns Hopkins Medical School 
in Baltimore, Maryland, and Jack Szostak at 
Harvard Medical School in Boston, Massachu- 
setts. The three have already won numerous 
prizes for their work, including sharing one of 
the 2006 Lasker awards, often considered to be 
a forerunner of the Nobel prize. 

Their research revealed a fundamen- 
tal aspect of how DNA, packed into 
chromosomes, is copied in its entirety 
by the DNA polymerase enzyme during 
cell division. The ends of the chromo- 
somes are capped by telomeres, long 
thought to have a protective function 
(see ‘Chromosome caps’). Without 
them, the chromosomes would be short- 
ened during each cell division, because DNA 
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polymerase is unable to copy to the very end of 
one of the two DNA strands it is replicating. 
In the early 1980s, after their fortuitous 
meeting at a Gordon Research Conference in 
1980, Blackburn and Szostak discovered 
that telomeres include a specific DNA 
sequence. Fired up by the novelty of 
each other's work, they devised experi- 
ments that seemed crazy at the time, even 
to themselves. Szostak took the telomere 
sequences that Blackburn had identified in the 
protozoan Tetrahymena thermophila and cou- 
pled it with mini-chromosomes that he inserted 
into his own preferred model organism, yeast. 


CHROMOSOME CAPS 


Telomeres form protective caps at the ends of 
chromosomes, and are built from a repeating DNA 
sequence constructed by the enzyme telomerase. 


Nucleus 


Cell ; 


Cross-species effect 
The sequence was able to protect the chromo- 
somes in this different species’. It was soon 
found that the protection conferred by telom- 
eres is a fundamental biological mechanism 
present in nearly all animals and plants. Szostak 
and Blackburn suspected that an unknown 
enzyme must be involved. On Christmas 
GiciG “ceeT? | Dayin 1984, Greider — then Blackburn's 


graduate student — saw the first evidence 
that this enzyme, which Greider and Black- 
burn named telomerase, was responsible for 
constructing telomere DNA’. 


The DNA sequence shown is from the 
Tetrahymena telomere. 
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FUTURE OF HIV VACCINE 
UNCLEAR 

Puzzles to solve before 
trials can move forward. 
www.nature.com/news 


They worked out that telomerase provides 
a platform enabling DNA polymerases to 
copy the entire length of the chromosome 
without missing the ends. Greider and Black- 
burn also showed that telomerase contains a 
key RNA sequence that acts as a template for 
the telomere DNA’, which attracts proteins 
to form a protective cap around the ends of 
the DNA strands. 

Telomeres themselves shorten with 
repeated cell division, making up a key 
part of the cell’s ageing mechanism. Low 
telomerase activity and telomere shorten- 
ing speed up ageing, whereas incessantly 
dividing cancer cells often have high 
telomerase activity and maintain their 
telomere length. Cancer therapies directed 
against telomerase are now being tested in 
clinical trials. 

But there is still a lot of basic biology to 
discover — such as how telomerase activity 
is regulated at individual telomeres, and how 
telomeres manage to avoid the attentions of 
DNA repair enzymes which seek out breaks 
in DNA and restitch the torn ends. 

Blackburn and Greider become only the 
ninth and tenth female scientists to win the 
physiology or medicine prize since it was 
first awarded in 1901, and itis the first time 
that two women have been recognized in 
a single prize. Indeed, telomere research is 
unusually dominated by women. “It is hard 
to find a male among us,’ says David Shore, 
a cell biologist at the University of Geneva, 
Switzerland. “And two main reasons are Liz 
and Carol — they created the field and have 
been role models.” 

Blackburn has also been involved in 
the politics of science, serving on the US 
President’s Council on Bioethics from 
2002 until she was dropped in 2004 after 
criticizing the restrictions on human 
embryonic stem-cell research imposed by 
then President George W. Bush. 

Lea Harrington, Greider’s first graduate 
student, who is now at the Wellcome Trust 
Centre for Cell Biology at the University of 
Edinburgh, UK, says that her four years in 
Greider’s laboratory at Cold Spring Harbor, 
New York, were “electric. We all realized 
what an exciting time it was — so many 
questions being answered about the com- 
position of telomerase, how it worked and 
its relevance to human biology.” a 
Alison Abbott 


1. Szostak, J. W. & Blackburn, E. H. Cell 29, 245-255 
(1982). 

2. Greider, C. W. & Blackburn, E. H. Cell 43, 405-413 
(1985). 

3. Greider, C. W. & Blackburn, E. H. Nature 337, 331-337 
(1989). 


Nobel Prize in Physics 
awarded to light pioneers 


Two technologies that revolutionized 


science, computing and communication have 


secured their developers a share of the Nobel 
Prize in Physics. 

Charles Kao of the Chinese University of 
Hong Kong has won half the prize for his role 
in developing fibre-optic cables. The other 
half is shared by Willard Boyle and George 
Smith of Bell Laboratories in Murray Hill, 
New Jersey, for their development of the 
charge-coupled device (CCD), an electronic 
chip that converts light into a 
digital signal. 

In 1969, Boyle and Smith 
developed a chip that could 
transform light into an 
electronic signal. The duo 
used newly discovered metal 
oxide semiconductors that 
could convert photons into a 
flow of electrons, which could 
be read from the edges of 
the chip and used to recreate 
the image. The ability to 


Charles Kao: fibre-optic cables. 


light created by an orbiting planet. 

The detectors also made space-based 
astronomy a reality, says Matt Mountain, 
director of the Space Telescope Science 
Institute in Baltimore, Maryland, which 
coordinates science for the Hubble Space 
Telescope. “It made telescopes like the 
Hubble possible,” he says. “You could now 
put large electronic detectors in space that 
could beam down digital pictures of some of 
the faintest objects human beings have ever 
= seen.” 

Fibre optics has had an 
equally impressive impact 
on science, not least by 
facilitating collaboration 
ona global scale. But the 
transmission of data over 
thousands of kilometres 
seemed a distant dream when 
Kao first began his work on 
fibre-optic cables. Back then, 
fibres could carry light only 
a few metres by total internal 


digitally capture light has 

found application in nearly every field of 
science — particularly astronomy. “Basically, 
they revolutionized optical astronomy,’ 

says Mark Casali, head of instrumentation 
at the European Southern Observatory in 
Garching, Germany. Before the advent of 
CCDs, astronomers were imaging stars 
using photographic plates, which were less 
sensitive and less precise than their digital 
successors, Casali says. Using CCD cameras, 
astronomers have been able to discover faint 
galaxies and even see fluctuations in a star’s 


Willard Boyle and George Smith invented charge-coupled devices. 
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reflection before the signal 
faded. Kao and his colleagues at Standard 


Telecommunication Laboratories in Harlow, 


UK, worked out that impurities, mainly iron 
ions, were causing the loss. Kao identified 
an alternative material — fused silica — 

that could carry light over much greater 
distances without significant loss. The work 
ultimately led to the billion-kilometre-long 
network of fibre-optic cables that span the 
globe today. 

Fibre optics will also have a pivotal role in 
the world’s largest science experiment, the 
Large Hadron Collider (LHC) at 
CERN, Europe’s particle-physics 
centre near Geneva, Switzerland. 
The LHC’s largest detectors 


of raw data every second. The 
cables then shepherd the data 
to nearby servers and on to 
thousands of scientists in 33 
countries through an ultrafast 
computer grid. “The whole 


fibre,” says Ian Bird, the grid’s 
project leader. “There’s no way 


that our data rates could be 


Geoff Brumfiel 


707 


create around a million gigabytes 


infrastructure is based on optical 


sustained without it.” ia 


PUNCHSTOCK 


REUTERS 


ALCATEL-LUCENT/BELL LABS 


X-ray free-electron lasers fire up 


California's project has the lead as its facility goes live, but Europe aims for its own rapid-fire device. 


HAMBURG 

Heinz Graafsma is tired of the “pretty, but use- 
less” images of proteins that regularly adorn 
the pages of journals such as Nature. “Chem- 
istry depends on changes,” says Graafsma, the 
head of detectors for photon science at DESY, 
Germany’s high-energy physics laboratory in 
Hamburg. “The static world is boring.” 

Get ready for the movies. A new generation 
of light sources — including one just com- 
pleted in California, one under construction 
in Japan and one being built outside Graafsma’s 
office — are getting set not only to put atoms 
and molecules under the spotlight, but also to 
illuminate their dynamics. 

The devices, called X-ray free-electron 
lasers, produce flashes of X-ray light with ang- 
strom-level wavelengths — small and coherent 
enough to image individual atoms. The flashes 
are also more intense than any created before 
— stuffed with enough photons to create and 
study extreme states of matter such as plasma. 

But perhaps most importantly, the bursts of 
light are short — just hundreds of femtosec- 
onds long, the time it takes for light to cross a 
human hair. Pulses as brief as this can record 
functions, not just forms: the folding ofa pro- 
tein, the action of a catalyst, the splitting ofa 
chemical bond. 

“That is the revolutionary thing,” says 
Joachim Stohr, director of the Linac Coher- 
ent Light Source (LCLS) at the SLAC National 
Accelerator Laboratory in Menlo Park, Cali- 
fornia. The US$420-million machine, the first 
free-electron laser in the world to operate at 
wavelengths this short, began its first experi- 
ments last week. 

The new devices will outgun the workhorse 
of the past half-century: the synchrotron, in 
which beams of electrons, whipped around ina 
circle, emit bursts of X-ray radiation. Interest in 
synchrotrons is still high; the number of users 
at the four major US synchrotron facilities rose 


In California, SLAC researchers calibrate the magnets at the Linac Coherent Light Source. 


from 6,009 to 8,492 between 2000 and 2008. 
But these facilities are starting to reach funda- 
mental limits. Some experiments require many 
photon ‘hits; and these can require weeks, if 
not months, at even the brightest synchrotrons. 
In addition, synchrotron pulses are limited to 
the picosecond regime, a thousand times lon- 
ger than free-electron laser bursts. Like using 
a camera with a slow shutter speed, images of 
always-jittery molecules end up fuzzy. 

Just as synchrotron rings were first built for 
particle smashing, free-electron lasers also 
depend on a tool borrowed from particle phys- 
ics: the linear accelerator or ‘linac. The LCLS 
uses the 43-year-old SLAC linac, in which 
bunches of electrons are accelerated through 
a 1-kilometre-long tunnel along a path so 
tightly focused that Earth’s curvature and weak 


Science by the femtosecond 


Atomic physics: exploring 
how X-rays rip electrons from 
the inner shells of atoms. 
Warm dense matter: creating 
and studying states of matter 
that lie between solids and 
plasmas, found in the interiors 
of planets and cool stars. 
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Single particles and 
biomolecules: eliminating 
the need for crystallography, 
which is the main bottleneck 
in describing complex 
biological structures. 
Femtochemistry: making 
movies of chemical bonds 
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being made and broken, 

of crystals melting and of 
nanometre-scale droplets 
nucleating. 

Nanometre-scale dynamics 
in condensed matter: 

probing how proteins fold and 
polymers twist. E.H. 


magnetic field have to be taken into account. 

The electron bunches then reach a 130-metre- 
long section of undulators — magnets that 
‘wiggle’ the electrons and coax them into 
emitting X-rays. The wiggles are tuned to the 
wavelength of the emitted light, creating a feed- 
back mechanism: electromagnetic fields from 
the X-rays act on the electrons, concentrating 
them into small, tight groups that emit ampli- 
fied light in unison. 

On 10 April, LCLS engineers successfully 
tested this crucial idea, which was first pro- 
posed in 1971 (J. M. J. Madey J. Appl. Phys. 42, 
1906-1913; 1971). LCLS project director John 
Galayda remembers seeing a sudden surge in 
light on that day as the electron beam passed 
the tenth undulator and the amplification 
began to occur. 

Now it’s time to start using the beam. After 
a summer spent commissioning the first 
instrument, the first team arrived at 7:30 a.m. 
on 1 October for five straight days of data col- 
lecting. A gas jet shot atoms of neon into the 
oncoming beam pulses so that the scientists 
could study what happens when electrons from 
the atom’s innermost shell are stripped away. 
“Tt’s like peeling an onion from the inside out,’ 
says instrument scientist John Bozek. 

One of the most anticipated applications 
for these tiny spotlights (see “Science by the 


SLAC 


E. HAND 


femtosecond’) will be imaging single bio- 
molecules. At synchrotrons, proteins have to 
be crystallized so that the lattice-like structure 
of many identical proteins clarifies the image 
made by the relatively incoherent X-rays. But 
some targets, such as viruses and the proteins 
found in cell membranes, are notoriously dif- 
ficult to crystallize. 

Expanding the number of described protein 
structures will be important. But DESY direc- 
tor Helmut Dosch says that more surprises will 
come from descriptions of how those struc- 
tures move. Many disorders, such as Alzhe- 
imer’s disease, arise when there is a problem 
in the way in which proteins fold. “You have to 
understand what drives the folding,’ he says. 

There is a drawback to free-electron lasers: 
there are few chances to work with a piece of the 
spotlight. At circular synchro- 
trons, light can be siphoned off 
to experiment stations at regular 
intervals — the new PETRA III 
ring at DESY, for example, has 
14 stations that can hold up to 30 
instruments. But the straight- 
shot LCLS has just one station holding a single 
instrument, and work must proceed in series. 
“The available time is small and the amount of 
exciting science is large,” says Jerome Hastings, 
head of the LCLS science department. Eventu- 
ally, SLAC plans to install switching stations, so 
that light pulses can be diverted to each of six 
planned experiments. 

That’s one reason why scientists at DESY 
think there will still be plenty of work left to 
do when the European X-Ray Free-Electron 
Laser (XFEL) is completed in 2014. (Japan also 
hopes to complete a free-electron laser in 2010, 
next to its SPring-8 synchrotron in Harima.) 
An agreement, expected by the end of this 
year, will formalize the 13-nation, €1.1-billion 
(US$1.6 billion) XFEL project. 

The XFEL will navigate the terrain beneath 


In Germany, work is 
well under way on 
the European ‘XFEL’ 
light source. 


"The available time 
is small and the 
amount of exciting 
science is large.” 


urban Hamburg. Germany is footing up to 60% 
of the construction costs and is pressing ahead 
with construction, which began in January. 

In August, men in orange coveralls stacked 
reinforcing bars and electric cabling in a 
40-metre-deep chasm, having scooped away 
hundreds of thousands of cubic metres of dirt 
and the occasional Second World War mortar 
shell. From this pit at the edge of DESY, tun- 
nels bearing the linear accelerator will burrow 
northwest to the town of Schenefeld, 3.4 kilo- 
metres away. 

The tunnelling is the most expensive com- 
ponent of the XFELs construction, and much 
of the 5-year lead that the LCLS has over the 
XFEL can be attributed to the recycling of 
SLAC’s existing linear accelerator. Galayda 
says that the Californian project would have 
cost at least $300 million more if 
the team had had to dig a tunnel 
and build an accelerator from 
scratch. 

But the XFEL has its own 
trump card: its accelerator will 
use cryogenically cooled super- 
conducting cavities, allowing the electron 
bunches to be fired off more quickly. Whereas 
the LCLS is limited to 120 bursts of light per 
second, the XFEL will release 30,000. 

Massimo Altarelli, XFEL's designate direc- 
tor, says the machine's superior repetition rate 
will be particularly important in experiments 
involving dilute targets — a biomolecule float- 
ing in a solvent, say — where the chance of 
registering a photon ‘hit’ maybe slim. But LCLS 
scientists contend that, most of the time, all of 
the extra firepower will be wasted: the light will 
be discarded, absorbed by lead walls. 

Stohr says he is glad that the LCLS is up and 
running first, but adds that there will be plenty 
of important science for the XFEL. “There's 
more than one winner,’ he says. a 
Eric Hand 
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Choren Industries has opened 
a plant in Germany to produce 
its synthetic biofuel SunDiesel. 
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The last of four weekly articles looks at making liquid fuels direct from biomass. 


Petrol might yet survive the green revolution. 
Some investors are taking seriously the con- 
cept of ‘green gasoline — transforming the 
woody remains of plants into exact replicas of 
today’s transportation fuels. 

Many see promise because, unlike other 
biofuels, this product would blend smoothly 
into today’s petrol-driven infrastructure. “This 
is one I like. It’s got a chance of making it; says 
Lanny Schmidt, a chemical engineer who works 
on combustion processes and alternative fuels at 
the University of Minnesota in Minneapolis. 

Yet this ‘biomass-to-liquid’ approach is one 
of the least known in the biofuels portfolio, and 
barely makes a dent in alternative fuel quotas. 
A report by the US Foreign Agricultural Service 
estimates that in 2009 biomass-to-liquid fuels 
will make up just 2,000 tonnes of oil equivalent 
for road transportation in the European Union. 
The figure for bioethanol is 2 million tonnes, 
and for conventional fossil fuels it is more than 
310 million tonnes. The report concludes that 
the technology for biomass-to-liquid fuels is 
“in its infancy and will take some years before 
it reaches a significant volume”. 

Atleast one major oil company has dabbled in 
the field. In 2008, Royal Dutch Shell invested an 
undisclosed amount in Virent Energy Systems, 
a company based in Madison, Wisconsin. The 
collaboration aims to improve Virent’s proc- 
esses to take sugars generated from cellulosic 
waste and catalytically react them with water to 
produce fuel molecules. “It’s a premium high- 
octane [petrol] we're generating,’ says Randy 
Cortright, the company’s founder and chief 
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technical officer, who co-invented the technol- 
ogy with chemical engineer James Dumesic of 
the University of Wisconsin-Madison. 

But Virent currently produces about a litre 
a day, and getting that to more significant 
amounts will take time. “By the end of this year 
we will have a larger-scale pilot plant capable of 
40,000 litres a year,” says Cortright. A full-scale 
plant is some three to six years away. 

The technologies required are known but 
need refinement: they rely on breaking down 
biomass, such as sugar molecules, which can 
then be handled in conventional refineries to 
produce petrol, diesel or jet fuel. But the cata- 
lysts needed to convert the bio- 
mass to useful hydrocarbons 
are still being developed, as are 
ways to break down the bio- 
mass so it can be processed. 

George Huber, a chemi- 
cal engineer from the University of Massa- 
chusetts in Amherst, heats biomass so that it 
decomposes and releases volatile sugars, which 
are then passed over a zeolite catalyst to form 
the aromatic molecules benzene, xylene and 
toluene. These molecules are also extracted 
from crude oil, and xylene and toluene can be 
blended with other substances to make petrol. 
A mixture of these molecules can be produced 
for less than US$0.46 per litre, says Huber, who 
has started a company called Anellotech, to 
develop the technology further. 

Still, investors need patience, he says: “You 
make your money on volume.” To go from lab- 
based processes, such as his, to the pilot stage 
takes three or four years; to scale up to a large 
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demonstration plant will take another three or 
four years and hundreds of millions of dollars, 
he estimates. 

Nevertheless, some money is flowing into the 
field. Virent has raised $30 million in venture 
financing and has commitments of $40 million 
from industry and government funding. Amyris 
Biotechnologies, a firm in Emeryville, Califor- 
nia, that engineers microbes to increase biofuel 
yield, has amassed $140 million since 2006 from 
high-powered investors including Khosla Ven- 
tures and Kleiner Perkins Caufield and Byers. 

And late last month LS9, based in South 
San Francisco, attracted $25 million from oil 

and gas company Chevron. 
LS9 uses specially designed 
microbes to chew up biomass 
to produce hydrocarbons that 
can be refined as usual. 

“The promise of these fuels 
is that oil companies will be able to use them 
very easily,” says Harry Boyle, an analyst at 
London-based New Energy Finance. “The exit, 
if youre a venture-capital player, is huge and 
very exciting.” 

But it’s a long way to a profitable exit. For 
instance, biofuels producer Dynamotive Energy 
Systems, based in British Columbia, Canada, 
posted net losses of Can$1.5 million (US$1.4 
million) in the second quarter of 2009. 

Some are looking to federal loans to help out. 
Terrabon, based in Houston, Texas, announced 
in July that it had made organic salts from 
biomass, then turned them into petrol with 
Valero Energy Corporation, a refining com- 
pany headquartered in San Antonio, Texas. 


C. KOALL/GETTY IMAGES 


Q&A: A CONSERVATIVE 
PLAN FOR UK SCIENCE 
Adam Afriyie on research 
under a potential centre- 
right government. 
www.nature.com/news 


Terrabon, which uses technology developed 
at Texas A&M University in College Station, 
plans to build a bigger plant in Port Arthur, 
Texas, that can process 55 tonnes of biomass a 
day, producing 4.9 million litres of fuel a year. 
It has applied for a $25-million grant from the 
US Department of Energy to build this plant, 
but ifit doesn’t get the grant it will invest even 
more itself and make the plant even larger — to 
process up to 220 tonnes of biomass each day. If 
paying for the whole plant, says chief financial 
officer Malcolm McNeill, “you might as well go 
to the real size”. 

Meanwhile, the processing company UOP, 
based in Des Plaines, Illinois, has developed a 
pyrolysis technique that heats biomass to release 
oil. More work is needed to develop that oil 
into a fuel, but the technology is already being 
licensed by UOP’s joint venture with Cana- 
dian company Ensyn Technologies in Ottawa, 
Ontario. “What this technology has lacked is 
the economic drivers to make it happen,” says 
Graham Ellis, UOP’s business manager for 
renewable energy and chemicals. UOP wants to 
help existing refineries to license its upgrading 
technology to use in existing infrastructure. 


At Virent, researchers are engineering microbes 
to increase biofuel yield. 


In Germany, the car-maker Volkswagen, 
based in Wolfsburg, is leading a €13.6-million 
(US$20-million) project intended to eventu- 
ally produce 200,000 tonnes per year of liquid 
fuels from biomass. The processing will be 
done by Choren Industries in Freiberg. Choren 
has separately amassed investments of €140 
million, mainly from individual investors, 
although minority shareholders include Shell 
Deutschland Oil, Daimler and Volkswagen. It 


is now commissioning a new plant in Freiberg 2 
that will have a nominal capacity of 18 million § 
litres of synthetic biofuel per year, all of which 
will be sold to Shell. 

Some producers think they can eventually 
become competitive by offering a lower-cost 
product than many other first-generation bio- 
fuels. Raw-material costs for synthetic biofuels, 
says Choren spokeswoman Ines Bilas, can be 
around 40% of total costs, compared with nearly 
90% for biodiesel made from rapeseed oil. 

The fuel’s adaptability may also help it to 
catch up with other, more established biofuel 
alternatives. “You really can make [petrol] from 
sorghum or municipal waste,’ says McNeill. 

But for now, its future rests with process 
engineers and how well they can streamline 
the path from woody waste to liquid fuel. ™ 
Katharine Sanderson 


Correction 

The News story ‘Climate burden of refrigerants 
rockets’ (see Nature 459, 1040-1041; 2009) cited 
an incorrect year for when hydrofluorocarbon 
emissions were predicted to reach between 

5.5 billion and 8.8 billion tonnes of carbon dioxide 
equivalent annually. The year is 2050, not 2010. 
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Genome-wide association studies have identified hundreds of genetic clues to disease. 
Kelly Rae Chi looks at three to see just how on-target the approach seems to be. 


ive years ago human geneticists rallied around an 
emerging concept. Technology had granted the abil- 
ity to compare the genomes of individuals by looking 
at tens of thousands of known single-letter differences 
scattered across them. These differences, called single 
nucleotide polymorphisms or SNPs, served as reference 
points or signposts of common variation between indi- 
viduals. The idea was that common variants in the genome 
might contribute to the genetics of common diseases. 

Genome-wide association studies (GWAS) could scan 
SNPs in thousands of people, with and without a disease. 
When a DNA variant can be associated with the risk of 
developing a disease, it signals that something in that area 
of the genome might be partly responsible. With such ‘hits’ 
would presumably flow a better mechanistic understanding 
of disease, genetic-testing abilities and even treatment. 

“Many researchers really grabbed on to the common 
variant hypothesis, and in some cases it worked,” says 
Jonathan Haines, director of the Vanderbilt University 
Medical Center’s Center for Human Genetics Research in 
Nashville, Tennessee. But, he adds, “it hasn't panned out to 
be as pervasive an explanation as we thought”. 

Here are the stories of three hits. One provides a near 
perfect example of the positive outcome that this sort of 
unbiased approach can have. One reveals that without bio- 
logical context the findings can be hard to interpret. And the 
third demonstrates that GWAS in their current form can't 
cope well with some common traits. 


A direct hit in haemoglobin 
In 2007, researchers reported on genome-wide scans of 
healthy adults looking for SNPs associated with very high or 


712 


very low levels of fetal haemoglobin. Among several hits were 
variants of a gene on chromosome 2 called BCL11A (ref. 1). 
This finding, quickly replicated in multiple populations, 
generated a lot of excitement. 

Fetal haemoglobin is a remnant of embryonic 
development. For most people, the fetal version of this cru- 
cial oxygen-carrying protein drops off after birth as the adult 
version kicks in. Some people retain relatively high expres- 
sion, which seems to have no effect in healthy adults. But for 
patients with blood disorders such as sickle-cell disease and 
B-thalassaemia, those expressing high fetal-haemoglobin 
levels can be protected from some of the nastier ravages of 
the disease, such as leg ulcers, severe pain and even death. 

GWAS findings often just provide the signpost, a rough 
coordinate for a causal gene. The SNP signal is often outside 
gene sequences. The variants in BCL11A were a direct hit in 
a gene, however, and a surprising gene at that. The protein 
it codes for, which controls the expression of other genes, 
had been associated with cancer progression, but never with 
haemoglobin production. A mouse model had even been 
made in which the gene had been knocked out, but until 
the GWAS no one had looked at its regulation of blood. 
“Nobody would have ever dreamed that a gene like this 
would have any regulatory role in fetal haemoglobin,” says 
Martin Steinberg, a haematologist at the Boston University 
School of Medicine in Massachusetts. Last year, he and his 
colleagues replicated the BCL11A finding in three different 
populations with haemoglobin disorders’. 

Of course, there was functional work to be done. Stuart 
Orkin’s lab at Harvard Medical School in Boston reduced 
expression of BCL11A in cultured blood progenitor cells 
from humans. Fetal haemoglobin expression went up, 
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suggesting that the gene normally acts as a repressor’. Ina 
follow-up study’, the group showed that the gene controls 
the silencing of fetal haemoglobin during development in 
mice. 

But the exact switch mechanism has not been solved, says 
Orkin. His group has gone on to look for proteins and other 
genes that the product of BCL11A binds to or influences. 
Results from these studies will inform research that looks 
for molecules that would interfere with the gene’s expres- 
sion or function and thus serve as potential therapies to 
activate synthesis of fetal haemoglobin in people with blood 
disorders. 

Steinberg, for his part, hopes to use these and other GWAS 
findings to refine a computational tool that predicts disease 
severity or death in people with sickle-cell disease. The hope 
is to intervene earlier and with more specific treatments. 


The verdict: even those who have been generally pessi- 
mistic about the outcomes of GWAS consider the BCL11A 
find a win for science. “It’s a tour de force illustration of the 
value of GWAS,” says David Goldstein, a geneticist at the 
Duke Institute for Genome Sciences & Policy in Durham, 
North Carolina. “You learn something new, you understand 
the mechanism, and it’s biologically and clinically 
important.” As winners go, however, Gold- 
stein says it is ona short list. 

In some ways, says Steinberg, the GWAS 
provided a lucky hit. The researchers built 
on years of evidence that fetal haemo- 
globin has a powerful effect on the severity 
of sickle-cell disease and B-thalassaemia. > 
It was the clear physiological signal — 
quantity of fetal haemoglobin — that helped 
researchers to design the GWAS. Other genes 
and pathways will be found to affect the sever- 
ity of the disorders, but probably none with 
the same force as fetal haemoglobin. “We're 
not going to find another fetal haemo- 
globin,’ Steinberg says. 


Scoping schizophrenia 
Schizophrenia genetics has been a mire of 
false starts. Scores of candidate gene associa- 
tion studies had identified promising targets, but 
few held up to further scrutiny. So the excite- 
ment around approaching the disease in an 
unbiased genome-wide study was high. 
But the first four schizophrenia GWAS 
reported no statistically significant asso- 
ciations. Then, in research published 
last year, researchers performed scans 
in roughly 500 people with the disorder 
and 3,000 healthy controls. When 12 of 
the hits that turned up were examined in 
16,000 more individuals, a signal started to 
emerge. Three variants were significant, but 
only one of them was in a gene, ZNF804A, 
which encodes a protein with unknown 
function’. 

Having a potential candidate gave researchers 
something concrete to work with. One group took 
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115 healthy people, a little less than half of whom had two 
copies of the high-risk ZNF804A variant, and compared their 
brain activity using functional magnetic resonance imaging 
(fMRI), a method that reveals local blood oxygenation and 
presumably electrical activity in the brain in real time. Those 
with the variant, they found, had abnormal connectivity 
between certain brain areas, impairing “the degree in which 
they talk to one another’, says Andreas Meyer-Lindenberg, 
the director of the Central Institute of Mental Health in Man- 
nheim, Germany, who led the study®. Healthy adults with the 
variant were showing schizophrenia-like brain activity even 
though they showed no outward signs of disease. 

Combining genetic and brain-imaging data to study 
psychiatric disease is not new. Since 2001, researchers 
have used the strategy to link imaging data to candidate 
gene findings in schizophrenia, depression and autism. 
Meyer-Lindenberg’s study is the first to use a genetic loci 
identified through GWAS for follow-up with {MRL “We've 
now applied [the technique] to a variant that has definitive 
support as being a schizophrenia risk gene. That wasn't 
available before,’ says Meyer-Lindenberg. 

Part of the problem when seeking schizophrenia-related 
genes is that, unlike fetal haemoglobin levels for exam- 
ple, the definition of the trait both within and between 
studies can differ. Also the spectrum and severity of schiz- 
ophrenia symptoms varies between individuals and are 
sometimes subjective from a clinical perspective. 


NS That’s why fMRI is attractive. The researchers 


hope to get closer to quantitative measures 
of psychiatric disorders. “It makes sense to 
have a biological level of analysis on which 
these genetic associations can be studied? 
says Daniel Weinberger, the director of the 
genes, cognition and psychosis programme 
at the National Institute of Mental Health 
in Bethesda, Maryland, who pioneered the 
method in the 1990s. 
The ZNF804<A association from GWAS has 
been replicated in some studies but not others. 
™ And there are few clues to the mechanism by 
which this gene might contribute to brain 
connectivity. It was the group of Michael 
O’Donovan, a professor of psychological 
medicine at Cardiff University, UK, that 
made the initial discovery using GWAS. The 
team is now carrying out a series of experi- 
ments to determine which DNA sequences 
and other proteins it binds, and how variants 
might alter gene expression. 


The verdict: some have doubts about the combined 
assault of GWAS and fMRI on psychiatric illness. 
Imaging data itself isn’t the best quantitative trait, 

says Goldstein, because one three-dimensional 
_ fMRI image can contain more than 50,000 
- picture elements of data, a single trait can be 
defined in multiple ways. “There hasn't been 
a sufficient consistency in how those pheno- 
types are defined, he says. Weinberger and 
- others contend that the imaging paradigms are 
well established before they are used in imaging 
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genetics research. “I think it’s very important that 
the phenotypes are well validated — that they 
are themselves heritable, and that they're related 
in some way to the underlying neural circuitry,” 
says Weinberger. 

Reviews from others studying schizophrenia are 
somewhat lukewarm. Kari Stefansson, chief execu- 
tive of the Icelandic biopharmaceutical company 
deCODE Genetics in Reykjavik, says he’s not com- 
pletely convinced. “In schizophrenia, the imaging 
differences are subtle,” he says. Nevertheless, he 
plans to study differences in brain morphology 
using imaging and GWAS, in people with and 
without the disorder. 

Given the shortage of standout GWAS 
hits, should researchers continue to use the 
candidate-gene approach to form the basis 
for hypothesis-driven imaging genetics 
work? “It is still a point of debate,” Meyer- 
Lindenberg says. 


Sight set on height 
Height has produced clearer hits than schizo- _ 
phrenia, but with a less than satisfying punch. In | 
2007, by analysing the genomes of nearly 5,000 
people, researchers were able to see that a vari- 
ant in a gene called HMGA2 explains some of 
height’s variability — about 0.3% (ref. 7). Since 
then, additional GWAS have revealed more than 
40 loci involved in height. Added together, these 
variants account for 5% of the trait’s variation. 
Even a clear quantitative trait doesn't necessarily 
provide simple answers. 

Genes are thought to contribute to roughly 60-80% of 
the variation in stature, leaving much of the heritability of 
height unaccounted for by GWAS findings. This ‘missing 
heritability’ has been a thorn in the side of the common- 
disease-common-variant hypothesis (see page 747). “In the 
field of height,” says Haines, “obviously that hypothesis is 
not completely correct.” 

But the news isnt all bad. “Optimistic people like me say 
we didn't know anything about the genetics of height before 
2007,’ says Guillaume Lettre, a geneticist at the Montreal 
Heart Institute in Quebec, Canada. “Now we have more 
than 40 loci” 

Researchers may have loci, but they have little idea how 
these contribute to height. As with other traits, many of the 
associated SNPs fall within the vast regions between genes 
or within genes whose function is unknown. And with lit- 
tle funding for understanding height variation and scant 
biological footholds, the field sees very little follow-up of 
its GWAS leads. 

Lettre is collaborating with others with the hope of tying 
mystery SNPs to genes through animal models. “Basically 
what we are doing is taking the genes near these markers 
and looking at the expression of these genes in tissues that 
are relevant to height, he says. “There are not so many: 
bone, cartilage and pituitary gland.” 

He and others are also trying to coax existing height data 
into revealing stronger associations by grouping hits based 
ona single molecular pathway. Hong-Wen Deng, at the 
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University of Missouri in Kansas City, is planning 
__ toanalyse pathways involved in either bone health 
' or stature. “Many genes which may have small effects 
» for height may not be detected if you analyse them 
individually,’ he says. But jointly their effects may be 
may be detected. Others are looking at height at vari- 
ous points with the hope that differences in growth 
curves will reveal larger genetic associations. 

Researchers have already examined height and 

growth rates in about 3,500 people from North- 

ern Finland. Of 48 height-associated variants that 
they tested, 12 were linked with the rate of growth 
during infancy or puberty’. 


The verdict: some of the loci implicate molecular 
pathways already known to be involved in growth 
and development. A 1995 study had shown that a 
gene related to HMGA2 could influence height: 
mice lacking the gene were shorter, whereas mice 

with a truncated version developed gigantism’. 
The HMGA2 association has been further con- 

firmed by most, but not all, GWAS. 

Predicting how hits outside genes will fare 
is more difficult, and depends to some extent 
on how close the hit is to the nearest gene. 
“Tf you look at the height loci, they are much 
more likely to be near a gene that causes abnor- 
mal skeletal growth, than a similarly sized random 
set of loci,’ says Joel Hirschhorn, a geneticist at the 

Broad Institute in Cambridge, Massachusetts. 

But the nearest gene is a poor marker for what is likely 
to be causal says Goldstein. “Depending on the genetic 
model for what is causing the association, it could be nearby 
or not nearby,’ he says. In some instances changes in DNA 
act on genes a million bases away. “It really is remarkable 
that there are hundreds of reported associations, and the 
number that you can actually track down to an actual cause 
of the association is probably countable on one hand.” 

Researchers point to height as a ‘model trait’ because 
it is simple to measure and relatively constant compared 
with phenotypes such as blood pressure or glucose level. 
Then again, in GWAS of height, tens of thousands of people 
have been necessary to see the slightest associations. As a 
model trait, that could be problematic. “In some ways, it is 
showing us the future for other traits,” says Karen Mohlke, 
a geneticist at the University of North Carolina at Chapel 
Hill who was involved in some of the initial height GWAS 
work. “What it means for many other complex traits is that 
there will be as many loci found, or more” a 
Kelly Rae Chi is a freelance writer in Cary, North Carolina. 
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THE DISAPPEARING 


en years ago, Don Mavinic was working 
ona way to get rid of a pesky precipitate 
that plugs up the works of waste-water 
treatment plants. Known as struvite, 
the solid crud forms in pipes and pumps when 
bacteria are used to clean up sewerage sludge. 
Mavinic, a civil engineer at the University 
of British Columbia in Vancouver, Canada, 
realized that struvite was more than just rub- 
bish. A combination of phosphate, magnesium 
and ammonium, struvite contains many of the 
essential nutrients that plants need. Mavinic 
has developed a way to remove the precipitate 
during the water-treatment process and he is 
now selling it as a ‘green’ fertilizer. His tech- 
nology was first used commercially in 2007 
in a treatment plant in Edmonton, Alberta, 
Canada. It has since been exported to a plant 
in Portland, Oregon, which began using it this 
year. A sewage works in Derby, UK, success- 
fully tested the technology in September. 
Aside from finding a use for a troublesome 
by-product, the recycling of struvite could also 
help solve a much bigger problem: the dwindling 
supply of phosphate rock. All life forms require 
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phosphorus in the form of phosphate, which 
has an essential role in RNA and DNA and in 
cellular metabolism. Every year, China, the 
United States, Morocco and other countries 
mine millions of tonnes of phosphate from the 
ground (pictured above), the bulk of which is 
turned into fertilizer for food crops. But such 
deposits are a finite resource and could disap- 
pear within the century. 

Experts disagree on how much phosphate 
is left and how quickly it will be exhausted. 
But many argue that a shortage is coming and 
that it will leave the world’s future food supply 
hanging in the balance. 

“I am starting to think phosphate rock is 
becoming a strategic material for many coun- 
tries. In the future it’s going to become more and 
more valuable,’ says Steven Van Kauwenbergh 
of the IFDC, an International Center for Soil 
Fertility & Agricultural Development based in 
Muscle Shoals, Alabama. Indeed, as political 
and social tensions build over the reserves of 
phosphate rock, the world could move from an 
oil-based to a phosphate-based economy, say 
some scientists and industry representatives. 
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Phosphate-based fertilizers have helped spur agricultural gains in the past century, but the world 
may soon run out of them. Natasha Gilbert investigates the potential phosphate crisis. 


“Tt is a very curious thing that something so 
important is so poorly understood and so little 
talked about in the larger political arena,” says 
Arno Rosemarin, a water-resources specialist 
at the Stockholm Environment Institute who 
has researched global phosphate use. Although 
international leaders have not tended to focus on 
the potential for phosphate shortages, the issue 
has been proposed for discussion next month at 
a United Nations meeting on global food secu- 
rity — an indication that it is starting to attract 
the attention of the international community. 


Just decades left? 

In many countries, phosphorus is a limiting 
plant nutrient in short supply in the soil. So 
farmers add phosphate-based fertilizers to 
increase agricultural yields. That has spawned 
a global phosphate-mining industry with sales 
totalling in the tens of billions of dollars. 

The US Geological Survey (USGS) in 
Reston, Virginia, estimates that around 62 bil- 
lion tonnes of phosphate remain in the ground 
(see graphic). This includes 15 billion tonnes of 
deposits that are mineable at present and others 
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that are not being exploited. The latter are left 
in the ground mainly because they contain too 
many impurities — such as cadmium and other 
toxic metals — or because they are offshore in 
difficult-to-reach places. 

In 2008, 161 million tonnes of phosphate 
was mined around the world, according to 
the latest, as yet unpublished, figures from the 
US Geological Survey. Stephen Jasinski, phos- 
phate-rock commodities expert at the survey, 
says that demand for fertilizers is predicted to 
grow by 2.5-3% per year for the next 5 years. If 
that rate continues, the world’s reserves should 
last for around 125 years. 

That is a relatively optimistic timescale, 
but it is echoed by the International Fertilizer 
Industry Association in Paris, whose members 
include 90% of the world’s fertilizer producers. 
Michel Prud’homme, executive secretary of 
the association's Production and International 
Trade Committee, says that the industry antici- 
pates that demand for fertilizers will grow at a 
“fairly moderate rate’, slowing by the middle of 
the century. That would enable reserves to last 
for at least another 100 years. 

But others predict a faster growth in demand 
for fertilizers, which would deplete phosphate 
reserves even quicker. The increased use will be 
driven in part by the rising global population, 


which will require food production to at 
least double by 2050, according to the Food 
and Agricultural Organization of the United 
Nations (FAO). 

Rosemarin and others say that nations should 
not rely on the reserves laden with impurities 
or located offshore because of the costs — both 
environmental and economic — of extracting 
usable phosphate. The remaining accessible 
reserves of clean phosphate rock would run out 
in 50 years, if growth stays at 3% per year, says 
Rosemarin. 

But the estimates all suffer 
from a lack of reliable data. 

Most of the world’s phosphate- 

mining companies are inte- 

grated with fertilizer firms and 

the mines are either owned by 

the companies or are under 

state control, says Pru@homme. 

As a result, it is difficult to get 

accurate, independent information on phos- 
phate reserves. 

Eric Kueneman, deputy director of the FAO's 
plant production and protection division says, 
“the reality is we as a public institution don't 
really know what the industry knows and nor 
do they know among themselves. To give a reli- 
able answer to the question, ‘will phosphates 
run out?’ we need a crystal ball.” 

The International Fertilizer Industry Asso- 
ciation collects data from its members on their 
existing reserves and on potential upcoming 
capacity. But some experts question the accu- 
racy of these data because they are supplied by 
producers who might be disinclined to provide 
proprietary information that could harm their 
commercial positions. 


No agreement 

There is also a lot of uncertainty over the data 
supplied by governments, which is the case 
with China and Morocco, says Dana Cordell, 
who has just completed her doctoral thesis on 
the effect of phosphate reserves on food secu- 
rity at the University of Technology Sydney in 
Australia. For example, when China joined 
the World Trade Organization in 2001, its 
reported reserves of phosphate rock instantly 
jumped from just over 2 billion tonnes to 
nearly 8 billion tonnes’. 

Cordell and Kueneman call for independent 
data collection on phosphate rock reserves. 
“Unlike for energy, water or nitrogen, there is 
no single international organization respon- 
sible for phosphate resources. That is very 
concerning,” Cordell says. 

The IFDC hopes to generate more solid 
data about the extent of the world’s phosphate 
resources and reserves. It will soon launch a 
project that will query phosphate producers, 
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academics and other minerals specialists to 
collect extensive data on how much phos- 
phate there is, how pure it is, what might be 
available in the future and the useful life of 
existing mines. Van Kauwenbergh, who is 
leading the project, expects to publish the first 
round of data in May next year. If the centre 
secures more funding, he hopes to continue the 
research for another 5 years. 

The USGS figures on phosphate reserves are 
the most-quoted publicly available informa- 

tion. But there are problems 
with them because the agency 
gets its information from for- 
eign governments, not directly 
from producers, and it is not 
independently verified. “We 
just don't know how good the 
USGS data are because they 
are based on second and third- 
hand information. The figures 
change all the time,’ says Van Kauwenbergh. 

Some people who track the phosphate indus- 
try say that there is no cause for concern about 
phosphate running out. “I don’t think this is an 
immediate crisis, but it is something we should 
be paying attention to,” says Jasinski. 

Prud’homme is sanguine about prospects 
for the future. If demand rises, then so will 
prices, he says, allowing companies to explore 
for new reserves and mine those that are harder 
to reach or from a lower grade of rock. “We 
feel there are enough reserves to meet food and 
material needs,’ he says. 

For example, companies have recently begun 
to investigate deposits in Peru, Australia and 
off the coast of Namibia that were not pre- 
viously considered financially viable, says 
Prud’ homme. These resources are not fully 
taken into account in the most recent USGS 
figures on world phosphate reserves, he says. 
And as some existing mines are tapped out, 
others are opening up in places such as Saudi 
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Arabia. “Iam convinced there are other sources 
we have not yet found, but it is difficult to say 
how much impact these will have,” he says. 

Others are sceptical that further exploration 
will uncover large new deposits or that they 
will solve the longer-term problem. “We are 
not going to find another Morocco,’ says Jas- 
inski, referring to the country with the biggest 
remaining reserves. 

In the meantime, companies have started to 
invest in new technologies to exploit the lower- 
grade and offshore deposits. The impetus for 
this move into more costly production was the 
hike in phosphate rock prices in 2008, when the 
value temporarily spiked at US$500 per tonne, 
more than five times the average price in 2007 
(ref. 2). Prices had remained comparatively 
flat for the previous five years. The price hike 
was due to tight supplies of the rock caused by 
increased demand for phosphate-based fertiliz- 
ers in India and China as well as record energy 
prices. Phosphate prices have since dropped 
back to their pre-spike levels. 


Few alternatives 

Despite the investments in unconventional 
reserves, those deposits may not be viable in 
the long term. Jan-Olof Drangert, an expert 
in water and land resources at Linképing 
University in Sweden, says that lower-grade 
reserves are “not a solution” if the world wants 
a sustainable system. Not only will extracting 
lower-grade phosphates be very expensive, it 
will also pollute the soils with cadmium, which 
is highly toxic to plant and animal life even 
in low doses, he says. “And then there is still 
the problem of exhausting these lower-grade 
reserves, he adds. 

The increase in demand for fertilizer in 
2008 may have been a taste of things to come, 
especially if demand for food rises as fast as 
some estimates suggest. The price hike last year 
“was a huge shock to farmers’, says Cordell. 
Fertilizers had to be rationed in some cases. 

“The bottom line is that it will just cost more 
to eat,’ says Rosemarin. “There will be no cheap 
lunches any more.” 


Struvite build-up in water-treatment pipes could 
be a valuable source of phosphate. 


718 


Making fertilizers go further 


No matter how much 
phosphate is left to be 
extracted from the ground, 
cutting down on the use of 
phosphate-based fertilizers 
and improving their efficiency 
could make a significant 
improvement, says Alan 
Townsend, a biogeochemist 
at the University of Colorado 
in Boulder. “Fertilizer is seen 
as a cheap insurance policy. 
Farmers tend to overuse it 
because they don't want to be 
caught out,” he says. 

In the past two decades, the 
United States and Europe have 


over-application of fertilizers, 
but that strategy continues 

to be a problem in other parts 
of the world, says Townsend. 
One of the biggest culprits 

is China’, where farmers are 
applying nearly twice as much 
fertilizer as is needed in the 
production of wheat. 

Experts disagree, however, 
on whether excess fertilizer 
application is actually 
unwarranted. Tony Vyn, 
an agronomist at Purdue 
University in West Lafayette, 
Indiana, says that the overuse 
of fertilizers in the European 


built up phosphate reserves 
in the soil. Farmers are now 
taking advantage of that 

by applying less phosphate 
than the crops actually need 
each year. So the strategy of 
China's farmers may not be 
unreasonable, he says. 

Other gains toward 
preserving phosphate 
resources could come through 
improved industrial practices. 
Between 40% and 60% of 
phosphate is lost when its 
host rock is converted to 
fertilizer. Researchers are 
now looking to reduce that 


reduced the widespread 


The uncertainty over the world’s phosphate 
reserves is compounded by the fact that sup- 
ply is concentrated is just a few hands. China, 
Morocco, the United States and Russia together 
hold more than 70% of the global phosphate 
deposits’, presenting the possibility of “market 
manipulation’, says Amit Roy, president of the 
IFDC. 

Evidence of strategic manoeuvring can 
already be seen. In March 2004, the United 
States and Morocco signed 
a free-trade agreement that 
covered phosphate rock, 
among other commodities. 
In 2008, Morocco exported 
$65-million worth of ferti- 
lizer to the United States’. 
Although the United States 
has one of the world’s largest 
phosphate rock reserves, the 
nation will see a significant drop in production 
in 25 years when it is estimated that produc- 
tion will peak at its key mines in Florida. The 
deal with Morocco, says Rosemarin, is aimed 
at securing the United State's future fertilizer 
and food supply. 

In the case of some finite resources, such as 
oil, alternatives can be found. But there are cur- 
rently no substitutes for phosphates. Cutting 
usage will help to make reserves last longer (see 
“Making fertilizers go further’). 

But most agree that some of the biggest gains 
will probably be made from the recovery and 
recycling of phosphates, such as Mavinic’s 
work mining the phosphate deposits inside 
water-treatment plants. In a back-of-the- 
envelope calculation, he estimates that if all 
domestic wastewater facilities in Canada 
were converted into biological treatment sys- 
tems using his technology, the country could 
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Union and United States has 


"There is no single 
international 
organization 


responsible for 
phosphate resources.” 
— Dana Cordell 


wastage. N.G. 


produce enough fertilizer to meet about 30% 
of its current needs. 

That pales, however, when compared with a 
much richer — and more pungent — source of 
phosphate: the manure generated by dairy and 
pig farming. Livestock waste contains around 
five times more phosphate than human waste. 
And the global livestock population is around 
65 billion, more than ten times the human 
population. There is “enormous potential” for 
recovering phosphates from 
livestock waste, says Mavinic, 
who has turned his attention 
to doing just that. 

The problem his research 
team is trying to solve is that 
phosphates in livestock waste 
are not in a dissolved form, 
which is necessary to make 
struvite. If programmes to 
recover phosphates from livestock waste suc- 
ceed, “the sky is the limit’, says Mavinic. “We 
would probably not have to import any fertilizer 
into this country.” 

But all this takes time. Decades may pass 
before recycling technologies gear up and new 
supplies of phosphate come on line. At present, 
nations have expressed little concern over the 
finite phosphate resource and are eagerly con- 
suming reserves. When solutions do eventually 
emerge, the world could already be in the grip 
ofa fertilizer and food shortage. a 
Natasha Gilbert is a reporter based in Nature's 
London office. 
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oss a rock into a quiet pond, and watch 

the ripples spread out across its sur- 

face. This is pretty much what happens 

when a photon hits the surface of a 
metal — except that in this case, the ‘ripples’ 
consist of electrons oscillating en masse and 
have wavelengths measured in nanometres. 
Once they are set in motion, these ‘surface 
plasmons; as the oscillations are known, can 
pick up more light and carry it along the metal 
surface for comparatively vast distances. “A 
river of light” is how Satoshi Kawata, a physi- 
cist at Osaka University in Japan, describes the 
phenomenon to his students. 

Plasmons can also focus light into the 
tiniest of spots, direct it along complex circuits 
or manipulate it many other ways. And they 
can do all of this at the nanoscale — several 
orders of magnitude smaller than the light’s 
own wavelength, and therefore far below the 
resolution limits of conventional optics. 

The result is that plasmonics has become one 
of the hottest fields in photonics today, with 
researchers exploring potential applications in 
solar cells, biochemical sensing, optical com- 
puting and even cancer treatments (see ‘Plas- 
mons at work’). 

Their efforts, in turn, have benefited greatly 
from the flowering of nanotechnology in gen- 
eral over the past decade, which brought with 
it a proliferation of techniques for fabricating 
structures at the nanoscale — exactly what 
plasmonics needed to progress from laboratory 
curiosity to practical applications. “The late 
1990s was kind of the turn- 
ing point” for plasmonics, 
says Harry Atwater, a 
physicist at the California 
Institute of Technology in 
Pasadena. 

One suprising example 
of the light-carrying phe- 
nomenon was witnessed 
in 1989 by Norwegian- 
born physical chemist 
Thomas Ebbesen, now at 
the Louis Pasteur Univer- 
sity in Strasbourg, France. As he held to the 
light a thin film of metal containing millions 
of nanometre-sized holes, he found that it was 
more transparent than he expected. The holes 
were much smaller than the wavelength of vis- 
ible light, which should have made it almost 
impossible for the light to get through at all. “I 
first thought, “Here was some kind of mistake,” 
says Ebbesen. 

But it wasn't a mistake, although it took 
Ebbesen and his colleagues the better part of a 
decade to work out what was happening. When 
the incoming photons struck the metal film, 
they excited surface plasmons, which picked 
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Light manipulation: surface plasmons could be generated (above) to help 
direct light using nanoantennas in devices such as solar cells (left). 
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up the photons’ electromagnetic energy and 
carried it through the holes, re-radiating it on 
the other side and giving the film its transpar- 
ency’. 

Hole arrays are increasingly finding their 
way into applications, for example as selec- 
tive filters for colour sensors. It turns out that 
the increased transmission through the sheet 
works only for light around the plasmons’ 
natural oscillation frequency. But this fre- 
quency, which is typically in the visible or 
near-infrared part of the spectrum, can be 
adjusted by changing the geometry of the 
holes and their spacing. So hole arrays can be 
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made into highly selective filters for sensors 
that depend on detecting specific colours, or 
for efficiently extracting monochromatic light 
from light-emitting diodes (LEDs) and lasers. 
Indeed, a number of commercial research labs, 
such as the Panasonic laboratory in Kyoto, 
Japan, and NEC in Tsukuba, Japan are working 
on prototypes of plasmon-enhanced devices 
for displays and telecommunications. 

Hole arrays can also be used to channel 
light into optical devices. In imaging chips for 
digital cameras, for example, researchers are 
studying how hole arrays placed on top of indi- 
vidual pixels might help capture incoming light 


R. VAN LOON/A. POLMAN 


Plasmons at work 


Although plasmonic effects have 
been known for more than a 
century, the history of plasmon- 
based applications began in 

the early 1970s, when Martin 
Fleischmann, a chemist at the 
University of Southampton, UK, 
and others began to study how 
light scatters from molecules 
stuck to a silver surface’. 

Richard Van Duyne, a chemist 

at Northwestern University in 
Evanston, Illinois, then discovered 
this scattering to be enhanced by a 
seemingly incredible six orders of 
magnitude’. 

In today's optimized devices, 
this enhancement, known 
as surface-enhanced Raman 
spectroscopy (SERS), can be 
several orders of magnitude larger 
still — strong enough to detect 
a single molecule’. Moreover, 
SERS has proved very useful in 
the biochemical and materials 
sciences by providing information 
on the chemical composition 
of molecules at very small 
concentrations. 

SERS is a plasmonic effect: 
silver nanoparticles act as 
antennas that take the incoming 
laser light and, through their 
surface plasmons, concentrate 
it. The concentrated light is then 
scattered by nearby molecules 
and amplified again by the silver 
nanoparticles on the way back 
out. This dual amplification 
results in a huge overall signal 
enhancement. 

Some applications have reached 
the market. For example, in 


more efficiently, and thus reduce 
pixel noise and improve camera 
sensitivity. 


specifically prepared colloids of 
gold nanoparticles, a clustering of 
these nanoparticles is triggered 
by the presence of pregnancy 
hormones. This leads to a colour 
change induced by plasmonic 
effects that has been widely 
commercialized in pregnancy 
tests. 

The commercialization of SERS 
has been hampered in many areas 
by difficulties in achieving highly 
accurate control over the surface 
nanostructures. For this reason, 
researchers are also looking at 
other sensing techniques such 
as localized surface plasmon 
resonance (LSPR). The idea is 
that, when a surface is covered 
with nanostructures in the 
shape of rods or triangles, their 
plasmonic properties depend 
strongly on the properties of 
medium that surrounds them. For 
example, a solution containing 
a certain type of molecule has 


Naomi Halas (centre, above) wants to use plasmons to fight cancer; 
others use them as sensors (inset) to detect single molecules. 


a refractive index that varies 

with the concentration of those 
molecules”. “These changes 

to the refractive index lead to 
measurable changes to the 
surface plasmon resonance 
wavelength, which can be 
observed experimentally,” says 
Stefan Maier from Imperial 
College London, who studies 
plasmonic nanostructures and 
their applications. “The effects 
can be dramatic.” Devices based 
on LSPR are becoming so sensitive 
that Van Duyne thinks that they, 
too, are about to reach the limit of 
single-molecule detection. 

And at Rice University in 
Houston, Texas, biomedical 
engineer Naomi Halas is pursuing 
an optical technique to destroy 
cancer cells. She hopes to 
inject cancer patients with gold 
nanoparticles that will be guided 
to the tumour by antibodies bound 
to the particles’ surface. Once the 
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nanoparticles are in place, she can 
illuminate the area with alow dose 
of infrared laser light that leaves 
healthy tissue undamaged, but 
gets absorbed to create plasmons 
in the gold. The energy heats up 
the nanoparticles and kills the 
cancer cells". 
So far, Halas's cancer therapy 
has been successful in trials 
with mice, where she achieved 
seemingly complete elimination 
of the tumours. The technology is 
now in human clinical trials with 
patients who have head and neck 
cancers. Halas says the results 
have been very encouraging 
so far. “There is no reason one 
would expect complications 
from something like this in 
humans relative to animal trials, 
because you are using physical 
mechanisms, heat and light, to 
induce cell death.” Halas is also 
optimistic that the treatment will 
be approved for use more quickly 
than a drug, which can involve 
difficult and expensive trials 
and many years to reach the 
clinic. She says the technique is 
being considered as a ‘device’ 
by the US Food and Drug 
Administration rather than a 
drug, which could also accelerate 
the approval process. J.H. 


Another plasmonic technique for channelling 
light into a device is to sprinkle its surface 
with nanoscale particles made of a metal such 
as gold. These nanoparticles function like an 
array of tiny antennas: incoming light is taken 
up by plasmons and then redirected into the 
device's interior. 


Slimming down 

From a commercial perspective, perhaps the 
most promising application of such nano- 
antennas — or indeed, of hole arrays — is in 
the improvement of solar cells. Present-day 
solar cells are made from semiconductors such 


as silicon. But to catch as much light as pos- 
sible from the broadest range of wavelengths, 
particularly in the red and infrared part of the 
spectrum, the semiconductor layer has to be 
relatively thick. “Right now a silicon solar cell 
is up to 300 micrometres thick,” says Albert 
Polman, a photonics researcher who directs 
the AMOLF institute in Amsterdam, where 
he works on improving solar-cell designs. And 
when cells are being deployed in arrays that 
cover a rooftop or more, he says, that adds up 
to a lot of expensive silicon. The price would 
come down a long way if the silicon was only 
1 micrometre thick. “But then you don’t catch 
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the red light because it goes straight 
through the chip,” he says, thus wasting 
much of the sunlight’s available energy. Other 
solar-cell materials have the same problem. 
With plasmonics, however, the problem 
goes away. In one approach that researchers 
are exploring, gold nanoparticles on the sur- 
face would act as reflectors that focus light into 
the semiconductor, where absorption effi- 
ciency increases with the light concentration. 
In another scheme, tiny gold nanoantennas 
could redirect sunlight by 90°, so that it prop- 
agates along the semiconductor rather than 
passing straight through. Either way, the cell 


721 


J.C. HULTEEN ETAL. J. PHYS. CHEM. B 103, 3854-3863 (1999) 


could get by with a much thinner 
semiconductor layer. 

Even as plasmonic techniques are 
decreasing the cost of the cells, they 
could also greatly improve the cells’ 
efficiency at extracting the available 
energy from sunlight — ina field in 
which even a few percentage points 
in efficiency improvement are cele- 
brated. Overall, the use of plasmon- 
ics could increase the absorption 
two to five times, says Atwater, who 
has co-founded Alta Devices in Santa 
Clara, California, to commercialize 
such solar cells. For cells made from 
amorphous silicon, which today 
have efficiencies of around 10-12%, 
the predicted enhancements could 
translate into efficiencies of about 
17%. For crystalline silicon cells, 
which currently have efficiencies 
around 20%, the new figure could 
approach the theoretical maximum 
of 29%. For commercial applications, 
the remaining challenges include 
developing workable device designs 
and fabrication techniques for mass 
production. 


Guiding light 


Plasmonics researchers are also grap- 


Plasmon resonance could be used to make very sensitive biochemical 
sensors (yellow bars). The waves here represent absorption spectra. 
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however, researchers will need to 
find a way to trigger the spasers using 
standard electrical currents. 

In addition to creating light and 
guiding it across a chip, optical com- 
puting will require a way to turn the 
flow of plasmons on and off at high 
speeds, so that the flow becomes a 
series of bits in a digital data stream. 
Many people have been working on 
such devices, and a plasmonic modu- 
lator based on silicon technology has 
been realized by Atwater’s group. Like 
a conventional transistor, in which an 
electric voltage controls a tiny elec- 
trical current, the group’s device is 
based on the use of an electric field 
to control the propagation of surface 
plasmons through the device®. Apart 
from their small size, compared with 
conventional optical counterparts, 
the operation frequency of plasmonic 
modulators can easily reach tens of 
terahertz, well above the gigahertz 
regime of modern computers. 

Many roadblocks still remain to the 
commercialization of such technolo- 
gies — ranging from the integration 
with silicon to device issues. “The key 
thing that keeps coming back are losses 
in the metals,’ says Mark Brongersma, 


pling with a longer-term challenge: 
the integration of optics and electronics on 
a single microchip. The decades-old idea is 
that, just as a fibre-optic cable can carry much 
more information than a copper wire, a light 
beam could, in principle, relay information 
through the chip on more channels and at a 
higher speed than conventional integrated cir- 
cuitry can handle. But the experimental opti- 
cal devices produced to date have been too 
large, and have showed rather 
high losses in the optical signal 
strength. 

“You want to bring the optics 
closer in size to the transistor,” 
says Polman. And that’s the 
beauty of plasmonics, which 
can offer optical pathways on 
virtually the same scale as the silicon struc- 
tures found in advanced microchips. “Metals 
can be well integrated with the chip design,” 
says Polman, “so you may be able to distribute 
light over an integrated circuit by plasmons.” 
Indeed, structures such as silver nanowires” or 
grooves etched into metal surfaces’ can provide 
pathways that guide light across a chip in what- 
ever direction the designers might need. 

But there is a trade-off as the structures get 
smaller. If the plasmons are forced to travel 
through a channel that’s too narrow, they start 
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to leak out from the sides and get lost, says 
Sergey Bozhevolnyi from the University of 
Southern Denmark in Odense, who is leading 
a European research project into integrated 
plasmonic circuits. Nevertheless, researchers 
can guide surface plasmons over distances of 
more than 100 um, which is roughly a thousand 
times bigger than the features on a current- 
generation microchip. This is enough to open 
rich possibilities for plasmonic 
nanocircuits, in which light 
would carry information along 
complex paths and through 

many processing steps. 
Plasmonic waveguides are 
particularly promising if the 
light source — typically a laser 
— can be incorporated on the chip as well. 
This has been done with comparatively large 
lasers, on the order of the wavelength of the 
laser light. But plasmonics now offers the pos- 
sibility of doing so at the nanoscale, at lengths 
much shorter than the wavelength. Rather than 
amplifying light in a conventional laser cavity, 
a plasmonic ‘spaser’ would amplify it with the 
help of plasmons — the first experimental 
evidence for such plasmon-based lasing was 
published in August*”. To fully integrate these 
plasmon lasers into standard microcircuitry, 
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a materials scientist at Stanford Uni- 
versity in California. However, he adds, smart 
design of the plasmonic structures could, in 
principle, reduce losses to acceptable levels. 

Plasmonics research has made remarkable 
progress in the past decade, and researchers 
are working on pushing our knowledge of plas- 
mons even further, for example to understand 
the physics very close to the metal surface. 
Nonetheless, says Atwater, “what has hap- 
pened in the past seven or eight years is that 
plasmonics has given to photonics the ability to 
go to the nanoscale and properly take its place 
among the nanosciences.” 

Joerg Heber is a senior editor at Nature 
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OPINION 


CORRESPONDENCE 


Sanctions against 
scientists threaten 
progress 


SIR — Several European 
countries, including France, the 
Netherlands and Sweden, are now 
routinely refusing work visas and 
study positions in the physical 
sciences to lranian scientists and 
students. This is in response to 
UN resolution 1737, imposing 
sanctions on Iran for failing to halt 
uranium enrichment, and reflects 
international concern about the 
potential proliferation of nuclear- 
weapons technology. The result 
has been blanket discrimination 
simply on the basis of nationality. 

Similar national-security 
policies against academics 
operate in the Middle East. The 
permit criteria used by Israel's 
security services for Palestinian 
postgraduate students are so 
restrictive that they effectively 
prohibit entry. Israeli universities 
have protested, ina letter sent to 
the defence ministry, that these 
policies constitute “a gross and 
harmful intervention by military 
elements in purely academic 
considerations” (see http:// 
go.nature.com/iFljgR). 

The International Council 
for Science (ICSU) affirms, 
in its principle of universality 
in science, that all scientists 
should have the opportunity to 
participate in legitimate scientific 
activities. |CSU’s committee on 
freedom and responsibility in the 
conduct of science, which | chair, 
continuously monitors breaches 
of this principle. We have recently 
called for the scientific community 
to commit to opposing all such 
discrimination (see http:// 
go.nature.com/2UmM5M). 

Academic institutions should 
have the responsibility and 
freedom to select students 
and staff without political or 
military interference. If selected 
individuals are refused entry ora 
work visa after security screening, 
the reasons should be made clear 
to that person. 

International collaboration 
and openness in science 


education and research are 
essential for meeting pressing 
global challenges. Systematic 
discrimination against scientists 
based on nationality is a serious 
threat to scientific progress. 
Bengt Gustafsson Department of 
Physics and Astronomy, 

Uppsala University, Box 515, 

75120 Uppsala, Sweden 

e-mail: bengt.gustafsson@fysast.uu.se 


Measures urgently 
required to prevent 
multiple submissions 


SIR — A recent experience leads 
me to believe that defiance 

of rules against simultaneous 
submission of papers to different 
journals may be growing more 
widespread. 

In a thorough review of a 
submitted paper (not for this 
journal), | pointed out that the 
study itself and the organization 
of the manuscript were below 
standard; | offered substantive 
constructive comments and 
recommended reconsideration 
after major revision. When the 
revised manuscript arrived, | 
made further suggestions for 
improving its scientific quality. 

At this point, and while the 
manuscript was technically 
still under consideration by the 
journal in question, | noticed in 
aroutine online search that it 
had been published in a different 
peer-reviewed journal offering 
rapid publication. Evidently, 
the authors had submitted the 
manuscript to the other journal, 
either simultaneously or after 
having received the reviewers’ 
comments, without withdrawing 
it from the first. They had even 
incorporated some of the 
comments from my original 
review. 

Cases of duplicate submission 
are disconcerting for journals and 
for the scientific community. They 
seriously violate the principle of 
disseminating scientific findings 


with professionalism and integrity. 


The practice is in breach of the 
authors’ contract to withhold 


submission of their manuscript 
to other journals until the editors 
have made a formal decision not 
to publish it. 

As the pressure to publish 
new results rapidly increases and 
competition becomes ever more 
intense, editors must define strict 
reinforcing measures to prevent 
such violations. 

Goudarz Molaei The Connecticut 
Agricultural Experiment Station, 
123 Huntington Street, New Haven, 
Connecticut 06504, USA 

e-mail: goudarz.molaei@ct.gov 


Nature journals forbid duplicate 
submission: http;/go.nature.com/ 
dthpdU 


Caution with claims 
that a species has 
been rediscovered 


SIR — We welcome the 

recent announcement by the 
conservation partnership 
BirdLife International that they 
have launched a “global bid to 
try to confirm the continued 
existence of 47 species of bird 
that have not been seen for up to 
184 years” (see http://go.nature. 
com/6Hc2Cn). But there are 
pitfalls, as the recent history of 
‘rediscoveries’ has shown. 

One of the species on BirdLife’s 
target list is the ivory-billed 
woodpecker (Campephilus 
principalis), a bird that was 
prematurely alleged to have 
been rediscovered in 2005. 

This seemingly improbable 
reappearance provoked intense 
debate within the scientific 
community about the veracity 

of claimed sightings and, more 
generally, about what represents 
sufficient proof of continued 
existence (or extinction). 
Accusations of ‘faith-based’ 
ornithology resulted, increasing 
scepticism among politicians and 
policy-makers that conservation 
organizations are often too willing 
to put public relations before 
scientific rigour. 

Many rediscoveries in the 
developing world are made by 
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individuals or organizations from 
Europe or the United States, or are 
a direct result of Western-backed 
expeditions or initiatives. This 
wrongly reinforces the impression 
that only Western scientists 

are competent to find and save 
threatened species. In addition, 
high-profile rediscoveries can 
create an unexpected imperative 
for immediate action by hard- 
pressed national conservation 
organizations. 

The international conservation 
community often seems to want 
it both ways, being unwilling to 
declare a species extinct but 
enthusiastically proclaiming the 
rediscovery of an ‘extinct’ species. 
This ambiguity is understandable 
— high biogeographic uncertainty 
can be generated both by the 
IUCN Red List requirements 
for ‘exhaustive surveys’ before 
aspecies is officially declared 
extinct, and by frequent 
taxonomic revisions that propel 
rarely seen subspecies to full 
species status. Rediscoveries 
are only meaningful if backed up 
by aself-sustaining population. 
Otherwise, conservationists are 
merely engaged in the sad task 
of documenting the prolonged 
demise of yet another species. 

The genuine rediscovery of 
‘lost’ species is anewsworthy 
event that helps bolster 
the pioneering, field-based 
credentials of conservation and 
draws attention to new sites 
worthy of increased protection. 
The combination of technology 
and improved access makes 
finding these species easier than 
ever. The real challenge is how 
to present rediscoveries to the 
public in a way that reflects their 
conservation significance and that 
will best encourage the support of 
future conservation efforts. 
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University Centre for the Environment, 
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Steve Jennings Oxfam GB, South Asia 
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An agenda for personalized medicine 


Pauline C. Ng, Sarah S. Murray, Samuel Levy and J. Craig Venter find differences in results from two direct-to- 
consumer genetics-testing companies. They therefore give nine recommendations to improve predictions. 


ore than 1,000 DNA variants 
M associated with diseases and traits have 

been identified’”. Direct-to-consumer 
(DTC) companies are harnessing these discov- 
eries by offering DNA tests that provide insights 
into personal genetic traits and disease risks. 
Genetic testing can improve lifestyle choices 
and increase preventive screening’. However, 
understanding of the genetic contribution to 
human disease is far from complete. 

There is debate in the genetics community 
as to the usefulness of DTC testing. Therefore, 
we compared results from two DTC companies 
(test kits provided by genomics companies 
23andMe in Mountain View, California, and 
Navigenics in Foster City, California) on 13 dis- 
eases for 5 individuals. Despite this limited data 
set we find potential implications for personal- 
ized medicine. Here we provide recommenda- 
tions to improve predictions and support the 
continued growth of this nascent industry. 

DTC genome scans are easy to get. Users 
order tests online, provide saliva or a cheek 
swab, and within a few weeks 500,000- 
1,000,000 of their DNA variant markers are 
scanned. The service provider then calculates 
a set of disease risks based on the customer’s 
specific combination of markers, and presents 
the results to the user online (see graphic). 

The accuracy of DTC genome-scan tests 
has been questioned. It is our assessment that 
the accuracy of the raw data is high. We found 
that the genotypes, or particular DNA bases 
observed, of an individual’s markers from 
23andMe and Navigenics agreed more than 
99.7% of the time. This is similar to accuracies 
reported by the genotyping companies. 


Two other major concerns are whether the 
predicted disease risks have any clinical valid- 
ity, and how well a genetic variant correlates 
with a specific disease or condition’. A few 
individuals have alluded to getting different 
predictions from different DTC companies for 
the same disease*®. We compared the consist- 
ency of disease-risk predictions between the 
two DTC companies to see where differences 
may arise (see Table 1). 

Both companies report absolute risk, 
which is the probability that an individual 
will develop a disease. Absolute risk is derived 
from two parameters: ‘relative risk’ and ‘average 
population disease risk. Relative risk is mod- 
elled from an individual's genetics. Average 
population disease risk varies depending on 
how one defines the population. For example, 
Navigenics distinguishes population disease 
risk between men and women (for example, 
men are more likely to have heart attacks than 
women), whereas 23andMe primarily takes 
into account age (for example, incidence of 
rheumatoid arthritis increases with age). This 
ambiguity in the definition of a ‘population 
underscores the caution one must exercise 
when interpreting absolute risk results. 

Even after we removed the average popula- 
tion risk variable we still found that only two- 
thirds of relative risk predictions qualitatively 
agree between 23andMe and Navigenics when 
averaged across our five individuals (see Table 
1). Certain diseases have better prediction 
agreement than others. For four diseases, the 
predictions between the two companies com- 
pletely agree for all individuals. In contrast, for 
seven diseases, 50% or less of the predictions 
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SUMMARY 


e For seven diseases, 50% or less of the 
predictions of two companies agreed 
across five individuals 

e Companies should communicate 
high risks better and test for drug 
response markers 

e Community should study markers in 
all ethnicities and look at behaviour 
after tests 


agree between the two companies across the 
individuals. 

A major contributor to the discrepancies in 
disease-risk predictions is the set of markers 
that each service chooses to use in calculat- 
ing relative risk. Risk markers are determined 
from genome-wide association studies, which 
survey hundreds of thousands or millions of 
markers across control and disease patients’. 
Each marker has different possible alleles. 
Alleles that occur more frequently in disease 
patients are designated as risk alleles and 
have odds ratios greater than 1. For example, 
in Alzheimer’s disease patients, 38% of ApoE 
alleles are the ApoE 4 risk allele; this allele’s fre- 
quency is only 14% in normal controls*. The 
odds ratio for the ApoE 4 risk allele is 3.7 (odds 
of exposure in cases, divided by odds of expo- 
sure in controls is (0.38/0.62)/(0.14/0.86)). The 
greater the frequency disparity between dis- 
ease patients and normal controls, the higher 
the odds ratio associated with the allele. Con- 
versely, alleles conferring protection against 
disease are observed less frequently in disease 
patients and have odds ratios less than 1. 

DTC companies harness the same publicly 
available research to decide which markers to 
include, and for the most part, could use the same 
or similar markers. Yet no disease has an identical 
set of markers between the two DTC companies 
because each company has its own criteria for 
accepting a genome-wide association result into 
its relative risk calculation”’’. Some markers are 
used by both companies for a particular disease. 
For identical markers and correlated markers, 
the odds ratios are similar between the two DTC 
companies (r=0.98 for identical markers; r=0.89 
for correlated markers). In other words, once 
DTC companies agree that a marker is predic- 
tive of disease, they tend to agree on its genetic 
contribution to disease predictions. 
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estimate that 1% of markers in an 
individual will violate the assump- 
tion of perfect linkage disequilib- 
rium, and in these cases, using a 
surrogate marker would be mislead- 
ing. Although this percentage seems 
small, hundreds of markers are 


BY tested so there is likely to be at least 
l= one error. Instead, direct genotyp- 
aA ing of the disease-associated marker 


would improve accuracy of risk 
= genotypes. Some DTC companies 


dM already target specific key markers 
not on the whole genome arrays or 
specific markers that have failed on 
v the whole genome arrays”. 


TR Test pharmacogenomic markers. 
An estimated 100,000 people die 
annually in the United States from 
adverse drug reactions'*"*, Although 
=l few drugs are labelled to require or 


hat gives t imilar DT 

PP ie ha toni ou” | TABLE 1: PREDICTIONS FOR DISEASE RELATIVE 
condition for which predictions RISKS FOR FIVE INDIVIDUALS 
agreed between the two companies 
for all five individuals in our analy-_| Breast cancer la lili a 
sis. For coeliac disease, both compa- 
nies have one strong-effect marker 
with a high odds ratio in common; _| Colon cancer ae = =v 1 
Navigenics also reports on seven | Crohr’sdisease IT Uy 
additional markers that 23andMe 

H k UAV = = = 
does not use. Thus, the number of aba z 
markers in common does not neces- tN = 
sarily correlate with better prediction | \Macular degeneration 4a Wa = dl 
agreement. However, the one strong- 
effect marker in common between = 
both companies occurs in >90% of | Prostate cancer T 
people with coeliac disease’ and Ut i 
its risk allele has an odds ratio of 7 =e oh ws = lt 
(ref. 12). The seven markers unique aynnnhorne 
to Navigenics have modest effects” 
and therefore do not affect the over- 
all relative risk prediction for this | Type 2 diabetes dal =a day AK 
disease as much. Generally predic- 7 increased risk (RR > 1.05), | decreased risk (relative risk (RR) < 0.95), = average risk (0.95 
tions tended to agree when there <RR 1.05). First prediction is from 23andMe; second prediction is from Navigenics. 
was consensus on the strong-effect Different predictions are highlighted in beige. 


recommend genetic testing, con- 
sumers could find specific variants 
useful. Variants in drug metabolism 


markers for a disease. 

When the DTC companies did not use the 
same strong-effect markers, we saw large dif- 
ferences in prediction. A clear example is the 
predicted disease risk for psoriasis. In one 
individual, 23andMe reports a relative risk of 
4.02, whereas Navigenics reports a relative risk 
of 1.25, more than a threefold difference. The 
difference is attributable to a marker unique to 
23andMe whose risk allele has an odds ratio of 
2.8 (ref. 13). This marker is not included in the 
Navigenics analysis because the result does not 
seem to pass Navigenics publication require- 
ments for marker inclusion. 

Another concern is the use of markers that 
have uncertain odds ratios estimates. A marker 
for type 2 diabetes that Navigenics uses has 
the highest odds ratio among all of Navigen- 
ics’ type 2 diabetes markers as reported in the 
literature. It therefore makes the strongest 
contribution to the overall disease prediction. 
However, Navigenics warns that the marker’s 
effect is statistically insignificant and may not 
contribute to disease. The average consumer is 
unlikely to appreciate the significance, or lack 
thereof, of this result. 

These findings lead us to propose the fol- 
lowing recommendations for a personalized 
medicine research agenda. 


Company recommendations 

Report the genetic contribution for the 
markers tested. Currently, the markers that 
have been discovered by genome-wide asso- 
ciation studies do not explain the majority of 
the genetic heritability of disease. For example, 


current literature indicates that approximately 
60-65% of the heritability of coeliac disease is 
still unaccounted for. Therefore, the marker set 
used to screen for disease can miss unknown 
genetic factors, leading to false negatives. We 
recommend that DTC companies report the 
proportion of the genetic contribution of a dis- 
ease that can be attributed to the markers used 
in their test, and the proportion of the genetic 
contribution that is still unknown. This is dif- 
ferent from reporting the genetic contribution 
versus the environmental contribution, which 
DTC companies emphasize on their websites. 


Focus on high-risk predictions. Most of the 
diseases predicted in the DTC reports imply 
only a modest risk compared with the average 
population (approximately 80% of reported 
relative risks lie between 0.5 and 1.5). We rec- 
ommend that DTC companies structure their 
communications with users around diseases 
and traits that have high-risk predictions. 
Customers could focus their lifestyle changes 
based on these. However, if there is a low- 
risk prediction for disease, a sense of security 
should not be assumed because much of the 
genetic contribution to disease risk has yet to 
be understood. 


Directly genotype risk markers. If the risk 
marker in the published literature is not 
directly assayed by the DTC company, DTC 
companies currently use linkage disequilib- 
rium (the non-random association of alle- 
les) to choose a surrogate risk marker. We 
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genes or recommended for test- 
ing by drug labels are informative and could 
greatly affect an individual's treatment. Exam- 
ples include variants affecting the efficacy of 
clopidogrel (used to reduce the risk of stroke 
or heart attack) or tamoxifen (used to treat 
breast cancer)'*. Most of the DTC companies 
are testing for some pharmacogenomic mark- 
ers’”'*; we encourage inclusion of as many of 
these markers as possible. 


Agree on strong-effect markers. DTC com- 
panies have agreed to use clinically validated 
markers for prediction, but not necessarily 
the same markers or number of markers”. 
This lack of consensus leads to inconsistent 
results between DTC companies. As studies 
are replicated, the number of markers and 
better estimates of their odds ratios should 
converge so that there is consensus to include a 
marker. Because these studies will take time, a 
stopgap solution is for DTC companies to agree 
on using a core set of strong-effect markers to 
achieve better prediction consensus and con- 
sistent reporting to the consumer. 


Community recommendations 

Monitor behavioural outcomes. One of 
the fundamental questions with DTC tests 
is whether they modify consumers’ behav- 
iour long term, and hence benefit lifestyle 
and health”’. More public studies need to be 
funded to monitor behaviour resulting from 
DTC testing to identify the best strategies for 
using personal genomic data to improve an 
individual's health. Studies are currently under 
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way, and applying the findings will be central 
to the success of DTC genome tests and cred- 
ibility of the field”. 


Carry out prospective studies. Agreement 
on risk predictions by DTC companies does 
not necessarily imply that the predictions are 
accurate or meaningful, and at this point in 
time, we cannot determine who has the ‘best’ 
predictions. To effectively assess the clinical 
validity of these genetic tests the community 
needs more prospective studies with tens or 
hundreds of thousands of individuals that 
measure the predictive value of known mark- 
ers’ **. Such studies are useful because they 
consider risk markers simultaneously, measure 
the interaction between different markers and 
do not assume a risk model. It may be practical 
to prioritize common diseases with significant 
health impact because of the large numbers of 
individuals and the expense associated with 
prospective studies. 


Replicate associated markers in other eth- 
nicities. Genome-wide association studies 
have been conducted primarily on populations 
with European ancestry’. Disease-associated 
markers may not transfer from one popula- 
tion to another — allele frequencies or linkage 
disequilibrium patterns may differ’**. There- 
fore, we strongly recommend the validation of 
these markers and the surrounding pattern of 
genetic variation in other ethnicities. 


Sequence rather than genotype. Eventually, 
sequencing an individual's genome will become 
economically feasible. Sequencing has an 
advantage over genotyping because it captures 
the full spectrum of an individual's variation 
and determines, rather than infers, a higher 
resolution of variants. However, identification 
of variants should not be confused with their 
interpretation, and pinpointing the causative 
disease variant will still be challenging”. Our 
ability to identify variants from comprehen- 
sive sequence data will far outstrip our ability 
to characterize their biological effect. However, 
accurate and complete reporting is a necessary 
predecessor to a precise functional understand- 
ing of genomic data for the consumer. a 
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Let's celebrate human genetic diversity 


Science is finding evidence of genetic diversity among groups of people as well as among individuals. This 
discovery should be embraced, not feared, say Bruce T. Lahn and Lanny Ebenstein. 


growing body of data is revealing the 

nature of human genetic diversity at 

increasingly finer resolution’. It is 
now recognized that despite the high degree of 
genetic similarities that bind humanity together 
as a species, considerable diversity exists at 
both individual and group levels (see box, page 
728). The biological significance of these vari- 
ations remains to be explored fully. But enough 
evidence has come to the fore to warrant the 
question: what if scientific data ultimately dem- 
onstrate that genetically based biological varia- 
tion exists at non-trivial levels not only among 
individuals but also among groups? In our view, 
the scientific community and society at large 
are ill-prepared for such a possibility. We need 
a moral response to this question that is robust 
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irrespective of what research uncovers about 
human diversity. Here, we argue for the moral 
position that genetic diversity, from within 
or among groups, should be embraced and 
celebrated as one of humanity's chief assets. 
The current moral position is a sort of 
‘biological egalitarianism. This dominant 
position emerged in recent decades largely 
to correct grave historical injustices, includ- 
ing genocide, that were committed with the 
support of pseudoscientific understandings 
of group diversity. The racial-hygiene theory 
promoted by German geneticists Fritz Lenz, 
Eugene Fischer and others during the Nazi 
era is one notorious example of such pseudo- 
science. Biological egalitarianism is the view 
that no or almost no meaningful genetically 
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based biological differences exist among human 
groups, with the exception of a few superficial 
traits such as skin colour’. Proponents of this 
view seem to hope that, by promoting biologi- 
cal sameness, discrimination against groups or 
individuals will become groundless. 

We believe that this position, although well- 
intentioned, is illogical and even dangerous, 
as it implies that if significant group diversity 
were established, discrimination might thereby 
be justified. We reject this position. Equality 
of opportunity and respect for human dignity 
should be humankind’s common aspirations, 
notwithstanding human differences no matter 
how big or small. We also think that biological 
egalitarianism may not remain viable in light of 
the growing body of empirical data (see box). 
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Will we soon cherish genetic diversity as we now do cultural diversity? 
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OPINION 


Many people may acknowledge the pos- 
sibility of genetic diversity at the group level, 
but see it as a threat to social cohesion. Some 
scholars have even called for a halt to research 
into the topic or sensitive aspects of it, because 
of potential misuse of the information’. Others 
will ask: ifinformation on group diversity can 
be misused, why not just focus on individual 
differences and ignore any group variation? 

We strongly affirm that society must guard 
vigilantly against any misuse of genetic infor- 
mation, but we also believe that the best defence 
is to take a positive attitude towards diversity, 
including that at the group level. We argue for 
our position from two perspectives: first, that 
the understanding of group diversity can ben- 
efit research and medicine, and second, that 
human genetic diversity as a whole, including 
group diversity, greatly enriches our species. 

For scientists to understand human genetic 
diversity in its totality, group diversity cannot 
be shunned. It is an integral and meaningful 
component of overall human diversity. For 
example, the International HapMap Project, 
which examines genetic diversity in several 


SUMMARY 


e Promoting biological sameness in 
humans is illogical, even dangerous 

e Toignore the possibility of group 
diversity is to do poor science and 
poor medicine 

e Arobust moral position is one that 
embraces this diversity as among 
humanity's great assets 


hundred individuals, has revealed clear 
genetic differentiation among the geographic 
groups represented by those individuals. More 
importantly, studies increasingly indicate that 
understanding genetic diversity at the group 
level can shed light on human evolution, the 
nature and acquisition of many human traits, 
including disease conditions, and how genetic 
and environmental factors interact to produce 
biological outcomes’*”*. Thus, to ignore group 
diversity is to do poor science. 

Neither can many medical applications of 
genetics safely ignore group diversity. It can 
facilitate the mapping of disease genes and 
lead to improved treatments”. For instance, 
groups can differ markedly in their ability to 
metabolize certain anticancer drugs®. Examin- 
ing the potential genetic contributions to these 
differences may illuminate how genes regulate 
drug metabolism and allow for more effective 
treatment. The ultimate goal of medical inter- 
vention may be personalized medicine (see 
page 724), whereby treatment is tailored to the 
genetic make-up of each individual, but this 
will remain a distant ideal for years to come. 
For now, to intentionally overlook the influ- 
ence of group diversity on disease susceptibili- 
ties and treatment outcomes is to practise poor 
medicine. 

In addition to the above arguments, there is a 
much larger reason to embrace human diversity 
in all its forms, in our view. Humanity’s genetic 
diversity — small or large, within or among 
groups — is a resource for, rather than a detri- 
ment to, creating a more fulfilling and prosper- 
ous society. Just as people have come over time 
to cherish cultural diversity, so we hope that 
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attitudes will warm towards genetic diversity. 

In the natural world, genetic diversity is a 
source of evolutionary resilience and adapta- 
bility. It buffers against changing environments 
and allows species to occupy broader and more 
fluid ecological niches. Even for a single indi- 
vidual, differences between its two copies of 
the genome can often lead to higher fitness. 
Indeed, sexual reproduction is thought to have 
evolved as a way for species to take advantage 
of genetic diversity. Consequently, the loss ofa 
species’ diversity often threatens its long-term 
survival. The susceptibility of agro-monocul- 
ture to sudden disease outbreaks or climate 
changes is just one example. 

In humans, genetic diversity may be 
particularly beneficial at a social or cultural 
level. Humans are uniquely complex in the 
social structures they form. Although genetic 
diversity may not be a prerequisite for social 
complexity, the former can foster the latter by 
allowing individuals with different tastes and 
abilities to make professional and personal 
choices that they enjoy and in which they are 
productive, thereby leading to personal fulfil- 
ment and contributing to a more prosperous 
society. Arguably, the United States is one of 
the most innovative, successful and cultur- 
ally vibrant countries in the world. It excels 
in numerous and wide-ranging areas, such as 
art, sport, business, science and political and 
economic thought. We believe that this is due 
in part to the nation’s diversity, both cultural 
and genetic, and to a liberal environment that 
allows individuals to pursue their unique and 
varied potentials. 

Group differentiation, by furthering this 
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diversity, adds to the total depth and breadth of 
human potential. In our view, the 2008 Olympic 
Games was a beautiful showcase of human diver- 
sity. Diversity at the individual level was evident 
from the wide-ranging physical attributes asso- 
ciated with different sports. Often athletes from 
different geographic areas also seem to excel at 
certain sports. Many of these differences may 
of course be explained by cultural and environ- 
mental influences, but should genetic variation 
contribute in any way to regional athletic ability, 
it would be hard not to see group diversity as a 
great asset of our species. 

Discussions of human genetic diversity 
inevitably touch on many sensitive issues. 
We therefore provide the following caveats to 
minimize misinterpretations of our position. 
First, the recognition that genetic diversity 
can contribute to variation in biological traits 
by no means diminishes the role of the envi- 
ronment in influencing many of these traits. 
Arguments for improving the well-being of 
individuals and groups through environ- 
mental approaches such as better nutrition, 
education, career opportunities and medical 
treatment lose none of their strength when 
embracing genetic diversity. Second, acknowl- 
edging differentiation among groups does not 
reduce the importance of diversity within 
groups, in which most human diversity seems 
to lie. Third, although we firmly believe that 
diversity is beneficial overall, we acknowledge 
that it might not always be so. For example, 
genetic diversity can lead to higher disease 
susceptibilities in some individuals or groups. 
We nevertheless believe that any downside of 
genetic diversity, including at the group level, 
does not detract from its overall benefit to our 
species. 

Itis also important to recognize that human- 
ity is diverse in its diversity — which is to say 
that genetic diversity contributes to variation 
across numerous physical, physiological and 
cognitive domains. How individuals or groups 
fare in one domain can be largely independ- 
ent of how they fare in others. 


Emerging understanding of human genetic diversity 


Genetic diversity is the 
differences in DNA sequence 
among members of a species. 
It is present in all species 
owing to the interplay of 
mutation, genetic drift, 
selection and population 
structure. When a species is 
reproductively isolated into 
multiple groups by geography 
or other means, the groups 
differentiate over time in their 
average genetic make-up. 
Anatomically modern 
humans first appeared in 
eastern Africa about 200,000 
years ago. Some members 
migrated out of Africa by 
50,000 years ago to populate 
Asia, Australia, Europe and 
eventually the Americas’. 
During this period, geographic 
barriers separated humanity 
into several major groups, 
largely along continental lines, 
which greatly reduced gene 
flow among them. Geographic 
and cultural barriers also 
existed within major groups, 
although to lesser degrees. 
This history of human 
demography, along with 
selection, has resulted in 
complex patterns of genetic 
diversity. The basic unit of this 
diversity is polymorphisms — 
specific sites in the genome 
that exist in multiple variant 
forms (or alleles). Many 
polymorphisms involve just 
one or a few nucleotides, 
but some may involve 


large segments of genetic 
material’. The presence 

of polymorphisms leads 

to genetic diversity at the 
individual level such that 

no two people's DNA is 

the same, except identical 
twins. The alleles of some 
polymorphisms are also 
found in significantly different 
frequencies among geographic 
groups". An extreme 
example is the pigmentation 
gene SLC24AS. An allele of 
SLC24AS that contributes to 
light pigmentation is present 
in almost all Europeans but is 
nearly absent in east Asians 
and Africans”. 

Given these geographically 
differentiated polymorphisms, 
it is possible to group 
humans on the basis of 
their genetic make-up. Such 
grouping largely confirms 
historical separation of global 
populations by geography’. 
Indeed, a person's major 
geographic group identity 
can be assigned with near 
certaintly on the basis of 
his or her DNA alone (now 
an accepted practice in 
forensics). There is growing 
evidence that some of the 
geographically differentiated 
polymorphisms are functional, 
meaning that they can lead to 
different biological outcomes 
(just how many is the 
subject of ongoing research). 
These polymorphisms 


can affect traits such as 
pigmentation, dietary 
adaptation and pathogen 
resistance (where evidence 
is rather convincing)? ’, 
and metabolism, physical 
development and brain biology 
(where evidence is more 
preliminary)°o*"*, 

For most biological 
traits, genetically based 
differentiation among groups is 
probably negligible compared 
with the variation within the 
group. For other traits, such 
as pigmentation and lactose 
intolerance, differences among 
groups are so substantial 
that the trait displays an 
inter-group difference that 
is non-trivial compared with 
the variance within groups, 
and the extreme end of a trait 
may be significantly over- 
represented in a group. 

Several studies have shown 
that many genes in the human 
genome may have undergone 
recent episodes of positive 
selection — that is, selection 
for advantageous biological 
traits®. This is contrary to 
the position advocated by 
some scholars that humans 
effectively stopped evolving 
50,000-40,000 years ago”. 
In general, positive selection 
can increase the prevalence 
of functional polymorphisms 
and create geographic 
differentiation of allele 
frequencies. BTL. & L.E. 


and celebrate this strength. There is nothing sci- 
entifically improbable or morally reprehensible 
in the position that people, including groups of 

people, can be genetically diverse. 


For example, although IQ is a Mm . Those who deny or even condemn 
useful metric of some aspects Geographic human diversity adopt a stance 
of intelligence and itis partly SrOup identifycan  thatisboth factually doubtful and 
heritable, it is far from a com- be assigned on morally precarious. On the whole, 
plete measure of total mental DNAalone.” humanity has been and will be 


capacity. Therefore, acceptance 
of human genetic diversity in its 
totality necessarily leads to the rejection of uni- 
dimensional rankings of the capacity of human 
individuals or groups. If anything, the study of 
genetics is taking us towards an ever greater 
appreciation of the multidimensional nature 
of human potential. 

Genetic diversity is a strength not a weakness 
of humanity. It is time to acknowledge, embrace 
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stronger, not despite our differ- 

ences, but because of them. | 
Bruce T. Lahn is in the Department of Human 
Genetics, University of Chicago, Illinois. Lanny 
Ebenstein is in the Department of Economics, 
University of California at Santa Barbara, California. 
e-mail: blahn@bsd.uchicago.edu 
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Have your say at http://go.nature.com/I76Rzs. 
See also Editorial, page 697, 
and online at http://go.nature.com/VqPUE2. 
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Winning the arguments on Capitol Hill 


Harold Varmus enjoys a guide to the inner workings of the US Congress by legislator Henry Waxman. 


The Waxman Report: 

How Congress Really Works 

by Henry Waxman with Joshua Green 
Twelve: 2009. 256 pp. $24.99 


Most US scientists who are politically engaged 
on behalf on their profession have one objec- 
tive: to enhance the budgets of their funding 
agencies. Their heroes are supportive advocates 
and congressional appropriators. But the rest 
of what Congress does may seem irrelevant, 
irrational or even mysterious. 

In his first book, Henry Waxman — a 
Democrat, a member of the US House of Rep- 
resentatives since 1975, and one of the most 
accomplished legislators of our time — gives a 
useful corrective, focusing on policy and over- 
sight, not just the money. The Waxman Report 
is a welcome guide for those who wish to learn 
more about the complex intersections of sci- 
ence and government, as the author describes 
his legislative fights against tobacco, HIV/AIDS 
and the use of steroids in sports; and his advo- 
cacy of food nutrition labelling, clean air and 
drugs for rare diseases. 

Waxman represents the 
district that includes Holly- 
wood, California, but he 
would not be called glamor- 
ous in appearance or style. 
He does, however, share other 
traits with his district's most 
famous industry — an apti- 
tude for dramatic staging, an 
appetite for intriguing strat- 
egies and a recognition of 
star power. Some of his most 
stirring moments have come 
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Henry Waxman (below left) made tobacco executives testify in Congress in 1994 to sway public opinion. 


move public opinion in the right direction by 
bringing the heads of tobacco companies to 
testify before Congress and then asking embar- 
rassing questions. That famous hearing — and 
the iconic photograph of the mass swearing-in 
of chief executives — helped 
to build public support for the 
extended powers over tobacco 
products recently granted to 
the FDA by Congress. 

Some years earlier, when 
Waxmanrs bill to provide tax 
benefits for companies that 
made drugs for rare illnesses 
was threatened with a Senate 
defeat or a presidential veto, 
he asked friends in Hollywood 
to produce a television show 
that dramatized the problem, 


pe 


“Landmark legislation 


when using hishearingroom can be attained and asked others to lobby 
as a stage to assemble power- through organization, President Ronald Reagan at a 
ful figures — from captains 4 i" holiday party. In this way, the 
of industry to sports heroes skill and hard work. Orphan Drug Act became 
— to expose deceptions that — Henry Waxman law in 1983. He also praises 


threaten public health or the 

environment. In this sense, he more closely 
resembles a morally driven film director than 
a committee chairman. 

Waxman legislative successes have often 
depended on understanding the importance of 
public support, shrewdly assessing how to get it, 
and effectively transmitting the message to key 
people. In 1994, he knew that Congress would 
not give the US Food and Drug Administration 
(FDA) any regulatory authority over tobacco 
products. But he also knew that he could 
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Edward Kennedy’s naming of 
one of the first major pieces of HIV/AIDS legis- 
lation in 1989 after Ryan White, a young patient 
with haemophilia who had been infected bya 
blood transfusion and who happened to live 
in a midwestern state represented by a senator 
whose vote was crucial. Even homophobic leg- 
islators were unlikely to oppose the Ryan White 
CARE Act. 

Gimmicks, of course, do not work on their 
own. Waxman’s successes have required a pas- 
sion for progressive policy, patience, persistence 
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and a deep knowledge of his subjects — traits 
that are all too rarely displayed in Congress 
these days. Waxman preaches a seemingly 
naive optimism, noting that “no matter how 
gloomy the outlook or fearsome the opposi- 
tion ... landmark legislation can be attained 
through organization, skill, and hard work” In 
fact, his victories have often depended on savvi- 
ness as well as on industry. As a proponent of 
compromise with his opponents, he writes of 
the virtue of being open to “unlikely alliances”. 
And as a tactician, he notes that whereas big 
issues generate noise, they often have little effect 
on ordinary people's lives. Smaller issues such as 
food labelling, he explains, may “fly under the 
radar, but... have a revolutionary impact”. 

Waxman is tough and pragmatic, as well as 
clever and idealistic. He defends his provision of 
campaign funds to fellow Democrats who might 
later support his bills, saying that it is “useful to 
think of money as a political fact of life”. He 
speaks frankly about his opponents’ faults and 
about his own occasional missteps, such as the 
day he yielded to an unfortunate compromise 
on the labelling of dietary supplements. And 
he recounts how he has made use of procedural 
tactics to achieve his ends, such as bringing the 
legislative process on a colleague’s weak ‘clean 
air’ measure to a near-standstill. 

Despite differences in social background, 
Waxman has much in common with the late 
Edward Kennedy. Recently eulogized as per- 
haps the most effective senator of the modern 
era, Kennedy was heir to a familial political 
tradition, entering the Senate with ease at a 
young age despite little experience. By contrast, 
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Waxman was raised by a struggling Jewish 
family working its way out of the Depression in 
Los Angeles. After fighting his way, with a few 
lucky breaks and a law degree, to the California 
State Assembly in 1968, he positioned himself 
in 1974 for a newly created seat in the House 
of Representatives and won a subcommittee 
chairmanship only five years later. 

In their respective halls of Congress, Kennedy 
and Waxman became similarly known as mas- 
ters of the legislative process, combining liberal 
political ideals with a willingness to work with 
opponents to get things done. During long 


careers, both have produced remarkable legis- 
lative records in domains in which science is 
important, including health care and regulatory 
policy — yet without ever serving on those all- 
powerful appropriations committees. In this 
slim volume, we learn how Waxman didit. 
Harold Varmus is a former director of the US 
National Institutes of Health and author of The 
Art and Politics of Science. He is president of 

the Memorial Sloan-Kettering Cancer Center, 
New York, and a co-chair of President Obama's 
Council of Advisors on Science and Technology. 
e-mail: varmus@mskcc.org 


China’s unofficial 


The Power of the Internet in China: 
Citizen Activism Online 

by Guobin Yang 

Columbia University Press: 2009. 320 pp. 
$29.50, £20.50 


In July this year, a 20-year-old university 
student in the southern Chinese city of 
Hangzhou was sentenced to three years in prison 
for driving recklessly and killing a pedestrian. 
This would have been a sad but unremarkable 
case, except that it was only brought following 
a huge national outcry. Reports that local police 
initially protected the student, whose family was 
well connected, were spread over the Internet 
and eventually forced the police to respond. 

Similar examples of online citizen activism 
occur every day. The Power of the Internet in 
China analyses how the Internet's rapid devel- 
opment in China has given its citizens a mecha- 
nism to air and share individual opinions that 
may differ from official positions, to connect 
and organize often against the will of the author- 
ities, and to improve their own lives directly and 
visibly. The Internet allows Chinese citizens to 
practise, as cultural critic Raymond Williams 
termed it, “unofficial democracy”. 

In researching the book, Guobin Yang, a 
professor at Columbia University who grew 
up in China, read Chinese material first-hand, 
observed and participated in online forums 
and interacted with Chinese citizens online. 
The book's 70 case studies range from patients 
with diabetes or hepatitis B fighting against 
governmental employment discrimination, to 
Internet-organized worldwide demonstrations 
in response to the 1998 Indonesian atrocities 
towards the local ethnic Chinese population, to 
massive online and offline protests over news 
reporting by Western media in the run-up to 
the 2008 Beijing Olympics. 


democracy 


Yang’s recounting of notable events along 
the historical path to China’s online activism 
brought back old memories of my own. The 
first electronic gathering place targeted at people 
interested in China — the USENET newsgroup 
soc.culture.china — was started soon after I 
left Beijing for Cambridge, UK, in late 1987. I 
quickly became an active participant, devot- 
ing entire mornings to reading and replying to 
postings. Asa student, I helped edit China News 
Digest, the first China-themed English-language 
electronic newsletter, which was published free 
by e-mail. 

The milestone event for the citizens Internet 


China's online community has found its own voice. 
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inside China was the founding in 1995 of the 
Tsinghua Bulletin Board System (BBS), which 
was started by students at the computer-science 
department of Tsinghua University, where I 
was an undergraduate. Even today, with the 
prevalence of text messaging, blogs, YouTube 
and Twitter, the BBS continues to be a widely 
used online platform in China, and its under- 
lying technology has progressed from dial-up 
connections to broadband networks. 

Although filled with vivid anecdotes, this 
book is an academic publication. Its story- 
telling is punctuated by jargon and scholarly 
narratives, including numerous academic ref- 
erences. Nonetheless, it is a valuable informa- 
tion resource. Yang’s analysis covers a broad 
canvas and includes many statistics. The inves- 
tigation into the business side of online activ- 
ism will particularly fascinate many readers. 
Online viewings surely translate into money, 
and manufactured online contention generates 
lots of viewings. Some businesses, including art 
dealers, present items as ‘banned in China’ to 
promote their wares. Also a reality are competi- 
tive tactics, suchas the ‘50 cents party’ — people 
who are paid 50 cents an item for posting pre- 
scribed messages at online forums. 

Governmental control of content is the 
elephant in the room. The mechanisms for 
restricting content flow into China and for con- 
trolling domestic Internet content — down toa 
single book entry on Amazon, for example — 
have become sophisticated in recent years. This 
is aided by the fact that only a few state-owned 
access points connect the domestic Internet to 
the outside world. Chinese ‘netizens’ counter 
these constraints with ingenuity, such as using 
Internet proxies to bypass state firewalls, or 
posting opinions in unrelated forums to post- 
pone detection. The Chinese habit of reposting 
— in which a user copies an article in its entirety 
to anew forum, rather than linking to the origi- 
nal posting — makes the job of eradicating an 
erratic blog much harder. 

Sixteen years ago this month, media mag- 
nate Rupert Murdoch declared that “advances 
in the technology of telecommunications have 
proved an unambiguous threat to totalitarian 
regimes everywhere’. Last year, China overtook 
the United States as the country with the larg- 
est online population. In the time between, 
Yang’s book documents how China’ netizens 
have stumbled on online activism as a response 
to, among other things, a flawed justice sys- 
tem. Time will tell whether the revolution in 
communication technologies will lead to anew 
cultural or social revolution. a 
Li Gong is chairman and chief executive of 
Mozilla Online, 21 Jian Guo Men Wai Avenue, 
Chaoyang, Beijing 100020, China. 
e-mail: lgong@mozilla.com 
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Darwin's legacy down under 


Reframing Darwin: Evolution and Artin 
Australia 

lan Potter Museum of Art, Melbourne, 
Australia 

Until 1 November 


Australia’s unusual fauna and flora, encoun- 
tered by Charles Darwin during his eagerly 
anticipated visit of 1836, surely influenced 
his evolutionary thoughts. Yet writing in his 
Journal of Researches (later known as Voyage 
of the Beagle), he focused more on Australia’s 
human inhabitants — its convicts, settlers and 
Aboriginal people — than on its natural his- 
tory. Nevertheless, as the exhibition Reframing 
Darwin at the Ian Potter Museum of Art in 
Melbourne shows, Darwin's legacy for science 
and art in Australia is great. 

The exhibition includes diverse pieces, from 
fine images of HMS Beagle and Australia at the 
time of Darwin's visit, to a turn-of-the-century 
undergraduate exam paper containing a ques- 
tion about Darwinian concepts — set at the 
University of Melbourne by Walter Baldwin 
Spencer, who was appointed foundation chair 
of biology in 1887. Tom Roberts’s power- 
ful portrait Aboriginal Head-Charlie Turner 
(1892) conveys great emotion, which was unu- 
sual for its time, and may have been a response 
to Darwin's The Expression of the Emotions in 
Man and Animals. Other artworks include 
Emmanuel Frémiet’s shocking bronze statue 
Gorilla Carrying Offa Woman —a gift in 1907 
from the artist to the National Gallery of Victo- 
ria — that is juxtaposed in the gallery with Julie 


Rrap’s unsettling digital pictures of women’s 
bodies that have been enhanced to comment 
on Darwin's theory of sexual selection. 

Also displayed are several works by Syms 
Covington, Darwin's servant aboard the Beagle. 
Ina letter to his sister, Darwin offers a brief and 
unflattering description of Covington, who 
was a potential witness to the evolution of Dar- 
win’s key idea. This characterization inspired 
the embryo of Mr Darwin's Shooter (Random 
House, 1998), the critically acclaimed novel 
by Australian author Roger McDonald, which 
places the challenging idea of natural selection 
in exquisite perspective. 

The precise layout of the Beagle 
is recorded in Philip Gidley King’s 
ink sketches of the upper, lower and 
quarterdeck. King, an Australian- 
born midshipman who served aboard 
the vessel, drew them from memory 
when he was 73. His sketches were 
used by a Melbourne craftsman to build an 
exact and finely crafted replica of Darwin's 
modest cabin — 2.7 metres wide by 1.5 metres 
deep by 1.8 metres high — a workspace that 
Darwin shared with King, John Lort Stokes and 
hundreds of books. 

Reframing Darwin highlights two water- 
colours of the Beagle in the imposing Chilean 
landscape of Valparaiso Bay. Originally attrib- 
uted to artist Conrad Martens who joined the 
ship at Montevideo in Uruguay, inconsistencies 
in the palette and composition had long puz- 
zled art historians. However, recent auctions 
in Santiago and London of pictures by the 
little-known English artist Carlos Chatworthy 


Augustus Earle’s 1826 painting of Australia’s Blue Mountains — where Darwin walked a decade later. 
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Wood Taylor, also known as Charles C. Wood, 
suggested that the watercolours are the work 
of Wood, who lived in Chile for some 30 years. 
It remains unclear why Beagle captain Robert 
FitzRoy commissioned them. 

Shown for the first time in public are ten 
watercolours by Louisa Anne Meredith, who 
was born in Birmingham, UK, in 1812 and 
migrated to Australia in 1839. Meredith is best 
known for her botanical illustrations, so the 
vivid images of Tasmanian fish are a surprise. 
The paintings’ extraordinary detail challenges 
the view that nineteenth-century female illus- 
trators merely pursued the picturesque. Colo- 
nial artists such as Meredith made a 
substantial contribution to our early 
understanding of Australian natu- 
ral history by enhancing the lifeless 
specimen collections with living 
images. Meredith corresponded with 
many scientists, including the botanist 
Joseph Hooker who was a friend of Darwin, and 
became a respected authority on Tasmanian 
natural history. It is remarkable that these lovely 
paintings have been hidden away for so long. 

Colonial Tasmania's sorry history with 
Aboriginal people did not pass unnoticed by 
Darwin, who anticipated the decline in the 
indigenous population. Five haunting mono- 
chrome watercolours of Tasmanian Aborigi- 
nals, painted by Thomas Boch in 1837, the year 
after Darwin's visit, reflect a respect by the art- 
ist that contrasts with the distasteful popular 
views of that time. Tom Roberts’s portraits of 
Aboriginals echo a similar sensitivity. 

For those who miss this terrific exhibition, 
its themes are explored in a beautifully illus- 
trated book by Jeanette Hoorn that also serves 
as a catalogue (Reframing Darwin, Miegunyah 
Press; 2009). It includes a biography of Baldwin 
Spencer, arguably Australia’s first evolutionary 
biologist, and explains what compelled Frémiet 
to create his bizarre and compelling Gorilla. We 
also learn that Darwin’s apparent indifference 
to collecting Australian flora and fauna was not 
down to a lack of time or interest, but to the fact 
that French naturalists had already done so. 

Darwin's new framework for understand- 
ing life generated vigorous debate among 
scholars of science and letters at the time. 
Reframing Darwin stimulates that discussion 
once more. a 
Mark A. Elgar is an evolutionary biologist in the 
Department of Zoology, University of Melbourne, 
Victoria 3010, Australia 
e-mail: m.elgar@unimelb.edu.au 


For more on Darwin, see www.nature.com/darwin. 
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A creative celebration of evolution 


Burning Man 2009: Evolution 
Black Rock Desert, Nevada 
31 August-7 September 2009 


The Burning Man festival is a unique happen- 
ing. For one week in September every year, 
the featureless Black Rock Desert in Nevada 
hosts a temporary community of artists, tech- 
nologists and visionaries. Lacking paved roads, 
water, electricity and any permanent structures, 
Black Rock City emerges from the ephemeral 
lakebed, or playa, with a population of nearly 
50,000. Afterwards, it disappears without trace, 
only to be reconfigured a year later. 

Fittingly for the 2009 iteration of this social 
experiment, this year’s theme was ‘Evolution. 
In the 23 years that Burning Man has been rep- 
licating, certain behaviours have 


capacity to evolve through genetics, 
and perhaps something that needs 
to be overcome through non-genetic 
evolutionary paths. Viewed from a 
different angle, the man seemed to 
float above a field of sea lilies, placing 
this celebration of human conscious- 
ness in an ancient evolutionary context. 

The most striking image at this year’s Burn- 
ing Man, expressed in various ways across the 
city, was the famous “ascent of man” progression 
from great ape through to modern human, with 
the Burning Man icon representing the next 
step. This sequence resonated with the advance 
in human culture realized in Burning Man. 
One vision was the Fishbug, Chimera sententia, 
acreature rising out of the playa with an arthro- 
pod tail, amphibian body, mammalian trunk 

and oversized primate brain. 


been selected for by the inhabit- “The ‘man’ effigy We created a zone at Burning 
ants: nee a ne - is the centre of Man that nee atavisms = 
ance, self-reliance coupled wit A reappearances of past events in 
extreme altruism, a gift economy the festival, both new contexts — in human social 
and a leave-no-trace environ- figu ratively and evolution. At our Atavism Camp 
mental ethic. Add intense crea- literally.” we created “The Spandrel; a shade 


tivity, conscious participation, 

ingenuity and a propensity for hedonism, and 
the outcome is an unparalleled celebration of 
the human spirit. 

The principal vehicle is art, from giant 
sculptures and lavish pyrotechnics to count- 
less instances of the most basic art of human 
interaction: giving and receiving. The ‘mar’ 
effigy is the centre of the festival, both figu- 
ratively and literally. This year, the 12-metre 
human shape hovered over a thorny for- 
est — a tangled bank — atop a giant double 
helix. The DNA molecule provided a pow- 
erful artistic meme, representing both life’s 


structure built with materials sal- 
vaged from the ‘boneyard’ at the University 
of Washington's Friday Harbor Marine Lab: 
leftover materials from past experiments, now 
reborn for a new purpose. At a symposium 
entitled “Evolution and Society, we asked how 
society has interpreted evolution and whether, 
despite its shadowy past, its principles can 
guide us to a much-needed behavioural shift 
towards sustainability. 

In the rampant transfer of culture at Burning 
Man, on a par with endosymbiotic events, we 
see hope. Evolution is evoked here on many 
levels: the adaptation and thriving of the 
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A sporadic eruption of desert art 
nods to humans' ability to adapt. 


individual in this extreme environ- 
ment, the various camps as interac- 
tive and artistic spaces, the city as it 
alters over the seven days and from 
year to year, exhibiting emergent 
properties of altruism, shared com- 
munity and free expression. ‘Burners’ 
become extremophiles. With resources scarce 
in the desert, intense sharing is the most effi- 
cient practice, suggesting that humans may yet 
realize a sustainable evolutionary trajectory. 

Next year’s theme of ‘Metropolis’ moves 
the festival a step further. Cities embody the 
best and worst of humanity, and Black Rock 
City is no exception. With its preponder- 
ance of oversized gas-guzzling camper vans, 
fossil-fuel-powered generators and gratuitous 
combustion, it is no Utopia. But the City’s 
Alternative Energy Zone, with its huge bank 
of solar panels, multiple experiments in grey- 
water evaporation, and wind-powered cocktail 
bar, is paving the way. 

Exodus from the barren plain brings us to 
the comparative paradise of juniper, sage and 
pinyon jays. Likewise, evolution beyond Burn- 
ing Man embodies what happens off the playa, 
how we share and act upon our experiences. ™ 
Jason Hodin', Cory D. Bishop”, Fred A. Sharpe® 
and Ruben E. Valas* are evolutionary biologists. 
'Hopkins Marine Station, Stanford University, 
California, USA. 
e-mail: seastar@stanford.edu 
Dalhousie University, Halifax, Nova Scotia, 
Canada. ?Alaska Whale Foundation, Seattle, 
Washington, USA. “University of California, San 
Diego, La Jolla, California, USA. 


For more on evolution, see www.nature.com/ 
darwin. 
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HUMAN GENETICS 


Sharp focus on the variable genome 


John A. L. Armour 


Copy-number variation — deleted or duplicated regions of DNA — is widespread in the human genome. 
Asystematic population survey of the common variants provides an invaluable resource for further studies. 


What makes people different? Much of the 
answer comes from inherited differences, and 
interpreting the extensive variation between 
people's genomes is a necessary part of under- 
standing the human genome. Variation in the 
form of single base changes (single nucleotide 
polymorphisms, SNPs), and repetitive DNA, 
is already well documented. Adding an extra 
dimension to human genetic variation is the 
increasingly evident prevalence and func- 
tional importance of copy-number varia- 
tion. Although most human DNA is present 
in exactly two copies per cell — one from 
each parent — some regions can be variably 
duplicated or deleted, leading to population 
variation in the number of copies inherited 
by different individuals. In a Nature paper 
that has just appeared online, Conrad et al.' 
report a working map for frequent human 
copy-number variation. It is a landmark 
in providing an unprecedented combina- 
tion of completeness and spatial resolution, 
and is likely to stand as a definitive resource 
for years. 

This, though, is by no means the first 
genome-wide survey of human copy-number 
variation” °. Previous investigations involving 
a technique called array-CGH — comparative 
genomic hybridization to microarrays of DNA 
targets — have detected numerous examples of 
copy-number variants (CNVs)”*. Array-CGH 
involves hybridizing fluorescently labelled 
genomic DNA from the test individual simul- 
taneously with DNA from a reference indi- 
vidual (also labelled, but differently) to a set of 
‘target’ DNA sequences from different parts of 
the human genome. Where test and reference 
DNAs both have the same numbers of copies of 
a DNA sequence, that target will give a stand- 
ard ratio of signals from the two fluorescent 
labels. If there is a different copy number, the 
ratio will shift — for example towards a lower 
representation of the test sample label for a 
region in which there is a deletion (Fig. 1). 

However, because comparative hybridization 
has hitherto been measured using relatively 
large pieces of DNA, the extent of DNA involved 
in a deletion or duplication has often been 
defined imprecisely. Consequently, there are 
real difficulties in interpreting precise location 


NA11995 
comparative 
intensity 


NA11995 


Figure 1| Demonstrating copy-number variants (CNVs) related to disease. This CNV, revealed in 
Conrad and colleagues’ data' by a sharp local reduction in comparative fluorescence intensity of a 
sample designated NA11995, is a deletion of about 20 kilobases that occurs upstream of the IRGM 
gene. The deletion affects one of the two copies of chromosome 5 in NA11995. It is known to influence 
disease predisposition, in that carriers of the deletion have a significantly increased risk of developing 
Crohn's disease’. The data for this figure were downloaded from web resources provided at 
www.sanger.ac.uk/cgi-bin/humgen/cnv/42mio/downloadBigDB.cgi. 


in current databases of CNVs; for example, are 
two independent reports of CNVs in approxi- 
mately the same place detecting different vari- 
ants or simply rediscovering the same one? 
A more precise alternative for discovering 
CNVs uses DNA sequencing to identify non- 
standard sequences around the junctions of 
deletions or duplications**. But even with the 
power of current sequencing technologies, 
relatively few individuals can be thoroughly 
surveyed using this method. 

Conrad et al.' solve the problem of compre- 
hensively defining variation at high precision 
by introducing a step-change in the spatial res- 
olution of genome-wide array-CGH. Despite 
the problems imposed by repetitive DNA in 
the human genome, their survey examined 
comparative hybridization at no fewer than 
42 million locations, using a short, synthetic 
DNA target for each location tested — an aver- 
age spacing of about 56 base pairs. The result 
is comparative intensity data for each synthetic 
target, which can be analysed for evidence of 
deletion or duplication. These data were noisy 
(and so needed to be averaged over several 
neighbouring probes to be reliable), but in 
practice the high density of coverage closes the 
gap between previous hybridization approaches 
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and sequence-based discovery methods. This 
high-resolution platform was used to survey 
DNA from 40 unrelated individuals (20 Afri- 
cans and 20 Europeans), giving a probability of 
better than 95% of finding CNVs present at a 
frequency of 5% or more. 

Even applying conservative criteria for infer- 
ring CNVs, requiring ten consecutive targets 
to agree in reporting a deletion or duplication, 
nearly 12,000 putative variants were initially 
identified, with each individual tested differ- 
ing in copy number from the reference sample 
at more than 1,000 distinct sites. More than 
8,000 CNVs were then firmly established using 
avariety of validation methods — most signifi- 
cantly, samples from the Wellcome Trust Case- 
Control Consortium disease-association study® 
were independently typed for the CNVs, the 
results of which will be reported separately. 

Collectively, the CNVs overlap about 13% 
of human genes. Some deletions remove entire 
genes; others will cause loss of gene function 
via frameshifts, in which the triplet DNA cod- 
ing register is shifted backwards or forwards. 
Deletions or duplications, especially those 
affecting an entire gene, have a higher a priori 
probability of affecting the gene’s function than 
individual SNPs. Conrad et al.' immediately 
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applied their new data to investigate the poten- 
tial role of CNVs in disease, by cross-checking 
SNPs implicated in previous human disease 
studies against SNPs they found to be associ- 
ated with CNVs. Could a local CNV be the 
real cause of some of these predispositions 
to disease (with the SNP acting as an indirect 
reporter)? If so, the SNP identified as over- 
represented in disease should correlate with 
chromosomes carrying a CNV. Reassuringly, 
this survey for CNV-SNP-disease associations 
produced a list including three well-established 
examples — CNVs associated with Crohn's 
disease’ (Fig. 1), psoriasis® and obesity’. Other 
CNVs on the list then become strong candi- 
dates for constituting the functional basis of 
the observed associations of SNPs with other 
disorders. Although these might be invaluable 
leads for understanding particular disorders, 
the authors are clear that the CNVs cannot 
solve the ‘missing heritability’ problem: in 
even the best-worked cases of disorders for 
which genetic predispositions have been char- 
acterized, most of the total risk attributable to 
genetic factors remains unexplained. 

This study has not found all human CNVs 
— the smallest CNVs, the less frequent CNVs 
and those embedded in complex, repetitive 
DNA will all have had a good chance of escap- 
ing detection. But Conrad et al.' will have dis- 
covered and characterized nearly all the CNVs 
big enough and frequent enough to matter, 
probably including many that will prove to be 
involved in disease. 

The authors also provide superb resources 
that will allow other researchers to use their 
data to find out more. These include a detailed 
listing of the genomic locations of the CNVs 
found, the genotypes of reference individuals 
and (most useful of all) a web-based archive 
of (nearly) raw data from the original 40 com- 
parative hybridization experiments. Making 
hybridization data freely available allows oth- 
ers to undertake detailed analyses of specific 
regions, for example to investigate potential 
variants not meeting the strict criteria imposed 
in this study. The Single Nucleotide Polymor- 
phism database (dbSNP) and International 
HapMap Project provide essential data for 
research into SNPs. Information from this 
study’ will likewise become the first-line source 
of CNV data for investigating human variation, 
genome evolution and disease genetics. a 
John A. L. Armour is at the Institute of Genetics, 
University of Nottingham, Queen's Medical 
Centre, Nottingham NG7 2UH, UK. 
e-mail: john.armour@nottingham.ac.uk 
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QUANTUM MECHANICS 


Passage through chaos 


Daniel A. Steck 


A quantum system can undergo tunnelling even without a barrier to tunnel 
through. The latest experiments visualize this process in exquisite detail, 
completely reconstructing the state of the evolving system. 


Reconciling quantum mechanics with classical, 
Newtonian physics has been a long-standing 
challenge. A major aspect of this challenge 
pertains to chaotic systems — simple and 
deterministic classical systems that never- 
theless display complex, seemingly random, 
unpredictable behaviour. The problem of 
‘quantum chaos is this: take a chaotic system, 
study its (simplest) quantum counterpart, 
and what you dont find is any of the unpre- 
dictable, chaotic behaviour from the classical 
world. This is a funny thing, because you can 
go into any toy store and see any number of 
chaotic, pendulum-like devices, dynamically 
waving about for the amusement of children 
everywhere. In principle, a physicist should be 
able to model these toys either as Newtonian 
collections of interacting rigid bodies, or as 
ensembles of manifestly quantum-mechani- 
cal atoms. The answer should be the same in 
either case, except, however, that the chaos 
seems to be missing from the quantum side of 
the picture. Now Jessen and colleagues’, writ- 
ing on page 768 of this issue, have experimen- 
tally studied the behaviour of the quantum 
version of a chaotic system with an unprec- 
edented level of precision and detail, provid- 
ing new insight into the quantum-classical 
boundary. 

To understand Jessen and colleagues’ experi- 
ments’, first consider what happens to an 
ensemble of atoms from the classical perspec- 
tive. It is the angular momentum of the atoms 
that we are concerned with — technically an 
abstract quantity, but it suffices to think of the 
‘orientation’ of an atom as its axis of rotation. 


Fixing the magnitude of the angular momentum, 
we can represent the orientation of each atom 
as a point on a sphere (Fig. 1a). The experi- 
ments transform the orientations of the atoms 
in two parts: the first is a ‘twist; in which a care- 
fully tuned laser pulse shears the points on the 
sphere (Fig. 1b), and the second is a rotation 
caused by a magnetic-field pulse (Fig. 1c). The 
authors’ sequence of twist/turn transforma- 
tions on the atoms realizes for the first time 
the ‘kicked top’ (Fig. 1), one of the simplest yet 
most important model systems for studying 
quantum chaos. 

The behaviour of an atom under this simple 
twist/turn map is rich and complex. To visual- 
ize it, consider the flattened representation of 
the sphere in Figure 2a, which shows the initial 
orientations of two groups of atoms forming 
two short line segments. The effect of repeating 
the twist/turn procedure ten times is shown in 
Figure 2b: one set of orientations is margin- 
ally distorted, whereas the other is stretched 
and folded in an intricate way. The stretching 
is indicative of erratic chaotic behaviour, and 
the point is that the dynamical behaviour in a 
given system can be mixed — certain initial 
orientations lead to chaotic dynamics, whereas 
others are comparatively ordered. This is best 
shown in Figure 2c, which plots the effects 
of many twist/turn iterations on several ini- 
tial orientations. Chaotic regions appear as a 
mass of dots, whereas stable regions are neatly 
organized into nested, ring-like layers. The 
important lesson to remember for now is this: 
because of the stretching, an atomic orientation 
in the chaotic region can wander throughout 


Figure 1| The kicked top. a, The angular-momentum vector (arrowed) of an atom can be visualized 
as corresponding to a single point on a sphere, which represents all possible angular momenta. b, The 
first, or ‘kick, step in realizing the ‘kicked top’ model system, which Jessen and colleagues’ implement 
in their study of quantum chaos, is a ‘twist’ of the points on the sphere — the points near the poles 
rotating the most, and the points on the equator staying put. c, The second step is a simple rotation of 
the whole globe about an orthogonal axis (not shown). 
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Figure 2 | Chaos in the kicked top. a, In this ‘flattened globe; the two coloured line segments denote two sets of points on the sphere, each representing the 
initial angular momenta of two collections of atoms. b, The effect of ten iterations of the kicked-top transformation depicted in Figure 1: the green line 
segment gets only a bit twisted, whereas the red segment is dramatically stretched and folded onto itself — a hallmark of chaos. ¢, Many iterations of several 
starting points, clearly showing regions of stability (onion-like rings) and chaos (a fuzz of dots). 


it, whereas an atom in an ‘island of stability’ is 
trapped there, confined to its particular ‘ring’ 

But now back to quantum mechanics — 
were talking about atoms, after all. As a conse- 
quence of Heisenberg’s uncertainty principle, 
quantum states of atoms cant be single points 
on the sphere, but must be smeared out to 
occupy at least some finite area. And again, 
there can be no chaos in the quantum case, 
in stark contrast to the classical model. Tradi- 
tionally, there have been two approaches to this 
problem of the missing quantum chaos. One is 
to study the conditions under which the classi- 
cal and quantum descriptions agree. For exam- 
ple, under a weak, continuous measurement, a 
quantum system can be persuaded to display 
chaos as appropriate to the classical case’. The 
other is to study the ‘fingerprints’ of chaos’ in 
the quantum system, and this is the approach 
taken by Jessen and collaborators’. 

The authors studied a phenomenon called 
dynamical tunnelling*. This is a bit different 
from the better-known barrier tunnelling, 
in which a quantum particle can penetrate a 
potential barrier despite not having enough 
energy to hop over it. Recalling the kicked-top 
behaviour depicted in Figure 2c, notice that 
there are two main stable islands in the left 
hemisphere and that a consequence of stabil- 
ity is that, classically, an atom starting in either 
island is trapped there — not by any potential 
barrier, but merely as a consequence of the 
twist/turn dynamics. Because of the symme- 
try of these two islands, quantum mechanics 
allows an atom starting in one island to hop 
back and forth to the other island, a dynamical 
tunnelling process between two atomic orien- 
tations strictly forbidden in the classical world. 
Jessen and collaborators’ experiments clearly 
demonstrated this, as well as an atomic quan- 
tum state sitting placidly in the large island 
and another moving erratically (though not 
chaotically) in the chaotic region — carefully 
respecting the classical boundaries between 
stability and chaos, despite being far into the 
quantum regime. 

The beauty of the experiments’ lies in the 
complete reconstruction of the quantum state, 
leaving no aspect of the tunnelling process hid- 
den. This is no easy task, involving the process- 
ing and combination of many measurements, 


and was not possible in previous studies of 
tunnelling**. The recovery of the full state also 
permitted observations of other fingerprints of 
chaos in a quantum system for the first time, 
such as the generation of quantum entangle- 
ment and the sensitivity to perturbations to 
the parameters of the system, rather than to 
its initial state’. 

Interesting future directions for Jessen and 
colleagues’ work include a push towards the 
classical limit, where more distinct quantum 
states live on the sphere. This is a technically dif- 
ficult regime, but one in which the fingerprints 
of chaos can be studied in even more detail, and 
where the controlled transition from quantum 
stability to classical chaos may be observed. 
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VISION 


Gene therapy in colour 


Robert Shapley 


Replacing a missing gene in adult colour-blind monkeys restores normal 
colour vision. How the new photoreceptor cells produced by this therapy 
lead to colour vision is a fascinating question. 


Colour blindness is a common genetic disorder 
(affecting about 5-8% of males, although fewer 
than 1% of females) in which the absence of a 
single gene on the X chromosome leads to a 
specific loss of function. Normal human colour 
vision relies on three distinct photopigments 
in the retina’s cone photoreceptors. Those who 
do not inherit the gene for one of the three 
cone pigments are called dichromats; such 
individuals cannot distinguish the difference 
between some pairs of colours that trichro- 
mats can discriminate easily. John Dalton, 
the famous British chemist, was a dichromat, 
and colour blindness is often referred to as 
daltonism. 

Colour blindness is common in New World 
monkeys, such as the squirrel monkey (Saimiri 
sciureus), because the species does not have all 
three of the cone-pigment genes that humans 
usually have. All male and some female squirrel 
monkeys are colour-blind dichromats, although 


© 2009 Macmillan Publishers Limited. All rights reserved 


most female squirrel monkeys achieve trichro- 
matic colour vision. But let’s pay attention to 
squirrel monkey dichromats. Mancuso et al.' 
report in this issue (page 784) that injecting a 
virus carrying a gene for the missing photo- 
pigment into the retina of adult colour-blind 
squirrel monkeys confers normal trichromatic 
vision; 20 weeks after injection the new pig- 
ment was expressed in cone photoreceptors 
and the formerly dichromatic monkeys began 
to discriminate between two colours that had 
looked identical to them before treatment. 
Mancuso and colleagues’ named one of their 
dichromatic monkeys Dalton after the chemist, 
but at the end of their experiments their Dalton 
was no longer colour blind. The success of these 
experiments offers promise that, perhaps in 
the foreseeable future, a similar therapy might 
improve visual function in humans. At the same 
time, these results raise a number of interesting 
questions about colour vision in primates. 
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a_ Before treatment After treatment 


Gene 
therapy 
__" 


Ganglion cells 


b_ Before treatment After treatment 


Gene 
therapy 


Figure 1 | Two paths to colour vision after gene therapy. The figure shows two different schemes that 
could generate trichromatic vision in populations of a dichromat’s cone-opponent ganglion cells 
after gene therapy. Excitatory connections between cones and ganglion cells are indicated by white 
bars and inhibitory connections by black bars. a, The new pigment M, is inserted into cones that 


excite ganglion cells that are not cone-opponent and do not respond to colour in the dichromats (left) 


because they compute differences of M,-cone inputs only. These cells may become M, — M, cone- 
opponent cells after injection (right), producing a functional red-green pathway in treated monkeys. 
This is a simplified schema; a more realistic connectivity is (M,+M,)—M,. b, The new pigment M, 
can substitute for some M, pigment and thereby generate S — M, cells (right) that exist alongside and 
functionally complement the S — M, cells that are already present in the dichromat (left). Once again, 


amore realistic picture would show the treated cells to be S— (M,+Ms). 


In humans, monkeys and most other verte- 
brates, each cone photoreceptor absorbs light 
over a broad range of the visible spectrum 
and transduces it into electrical signals. We 
identify each cone type by its light-absorb- 
ing photopigment, which is named for the 
wavelength of peak absorption. Humans 
have short-wavelength S-cones (with peak 
absorption at ~440 nm), medium-wavelength 
M-cones (peak absorption ~535 nm) and 
longer-wavelength L-cones (peak absorption 
~560 nm). Colour blindness in humans is 
usually caused by an absence of either M- or 
L-cones; from his symptoms, we can infer that 
John Dalton was missing M-cones. 

Dichromatic squirrel monkeys have S-cones 
but only one other cone type — a middle-wave- 
length cone that contains only one of three pos- 
sible middle-wavelength pigments (denoted 
M,, M, and M,) with peak absorptions at 535, 
545 and 560 nm, respectively; M, in the squir- 
rel monkey is like the human L-cone pigment, 
M, like the human M-cone pigment. 

There is no direct connection between the 
peak absorption wavelength and the role of the 
cone in colour perception; although L-cones 
(560 nm) are crucial for our ability to see red, 
the appearance of 560-nm light is in fact green- 
ish-yellow. But there is no mystery about this — 
the signals for colour are not the signals from 
individual cones but rather the cone-difference 
signals computed by post-receptoral cells in the 
retina and in the brain. Signals from the cones 
are passed through bipolar cells to the retinal 
ganglion cells which lie in a deeper layer of 
the retina and transport visual information to 


the brain. The retinal ganglion cells that carry 
signals about colour are called cone-opponent 
ganglion cells because they subtract the signals 
from different types of cone photoreceptor’. 
In most mammals there are ganglion cells that 
subtract signals from longer-wavelength cones 
from the excitatory signals from S-cones, and 
these ganglion cells tell the difference between 
blue and yellow’. For example, in Old World 
primates the blue-yellow signal difference is 
usually computed as S—(L+M). The blue- 
yellow ganglion-cell pathway in an individual 
squirrel monkey dichromat can be S—M,, 
S—M, or S— M,, depending on what longer- 
wavelength pigment the monkey has. 

Humans and Old World monkeys also have 
red-green cone-opponent retinal ganglion 
cells, the responses of which are proportional 
to the difference between signals from L-cones 
and M-cones (L—M or M-L). In trichromatic 
squirrel monkeys there is also a red-green 
pathway that computes the difference between 
the two longer-wavelength cones: M,— M,, 
M,;-M,, or whatever pair of cones the monkey 
has*. In Old-World monkeys there are many 
more red-green than blue—yellow ganglion 
cells, but in trichromatic squirrel monkeys 
the blue—yellow ganglion cells are much more 
numerous than the red-green’. 

Dichromatic squirrel monkeys with only 
M,-cones cannot discriminate blue-green 
lights with wavelengths of around 495 nm from 
grey light. But Mancuso and colleagues’ report 
that, after therapy with the gene encoding the 
M; pigment, their treated monkeys can easily 
tell blue-green from grey, just like trichromats. 
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One of many open questions to arise from these 
results is: what type of cone-opponent ganglion 
cell is active in the treated monkeys? 

There are two possibilities (Fig. 1). First, 
M,-cone signals from cones not affected by 
virus could be subtracted from the new M,-cone 
signals, in effect producing a new function- 
ing red-green (c,;M,+c,M,) - M, pathway, 
where c; and c, are weighting coefficients 41. 
This possibility would require that M,-cone 
signals are connected with some specificity 
to ganglion cells, for instance M,-cone signals 
would only be excitatory while M,-cone signals 
could remain both excitatory and inhibitory. 

A second possibility is that new M;-cone 
signals could be subtracted from S-cone 
signals to produce a new functioning blue- 
yellow S-(M,+M,) pathway that would 
complement the S— M, pathway already 
present in the dichromat*. Having both S— M, 
and S—(M, + M;) cells would allow the monkey 
to discriminate between blue-green and grey. 

One outstanding feature of Mancuso and 
colleagues’ data’ makes the second explana- 
tion — let’s call it the blue—yellow hypothesis 
— more plausible. The authors monitored the 
time course of cone-pigment function after 
gene therapy by measuring cone signals in 
an electroretinogram (ERG), and they report 
that signs of new, functioning M,-cone pig- 
ment appeared about 20 weeks after injection. 
Almost simultaneously with the appearance 
of viable new photopigment, the formerly 
dichromatic monkeys became able to perform 
the colour-discrimination task as proficiently 
as trichromats. That there was no measur- 
able delay in visual function suggests that the 
new cone signals were combined immedi- 
ately in pre-existing colour channels from eye 
to brain. 

The blue-yellow hypothesis would theoreti- 
cally require little or no rewiring, which is why 
it seems more likely. But this is only specula- 
tion. The question can be answered by mak- 
ing electrophysiological measurements in the 
treated squirrel monkeys to determine whether 
or not there are new red-green cone-opponent 
retinal ganglion cells or red-green cells in the 
lateral geniculate nucleus” (the first target of 
retinal ganglion cells), and also whether or not 
there are new S- M; or S —-(M,+M,;) blue- 
yellow cells as well as S- M, cells. 

In their paper’, Mancuso et al. remind us of 
the long-held belief that “neural connections 
established during development would not 
appropriately process an input that was not 
present from birth’, but their results refute this 
idea. Their paper is a pointer to future exciting 
research. o 
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COSMOLOGY 


Dark is the new black 


Richard Massey 


Rival experimental methods to determine the Universe's expansion are 
contending to become the fashionable face of cosmology. Fresh theoretical 
calculations make one of them the hot tip for next season. 


Since the Big Bang, the Universe’s initial 
expansion has been gradually slowed by the 
gravitational pull from the mass it contains. 
Most of this mass is in the form of invisible and 
mysterious dark matter. Today, however, the 
Universe seems to be re-accelerating under the 
influence of even weirder stuff dubbed dark 
energy. For astronomy funding purposes, ‘dark 
is the new black. Almost nothing is under- 
stood about either dark matter or dark energy 
— but both are many times more common 
than visible matter, and their tug of 
war will shape the fate of the entire 
cosmos. 

Tracking the expansion of the 
Universe, from which the relative 
amounts of dark matter and dark 
energy can be inferred, requires 
measuring the distances to galax- 
ies. Distances have always been the 
bane of astronomy: there are no sim- 
ple red and green glasses to extrude 
our two-dimensional picture of the 
sky into an expanding movie. Three 
rival techniques are currently trying 
to establish themselves as the best 
probe of cosmological expansion. 
A series of calculations by Schmidt 
et al.’* now allows one contender — 
gravitational lensing — to predict the 
observational consequences of dif- 
ferent cosmological theories at suf- 
ficient accuracy to be distinguished 
by future galaxy surveys. 

The accelerated expansion of the Universe 
was first detected about a decade ago** from 
observations of exploding stars called type 
Ia supernovae. These explosions happen at 
the same phase of stellar evolution, so they 
should all be of the same intrinsic bright- 
ness, regardless of their distance, but should 
look fainter the farther away they are from 
Earth. However, the accelerating expansion 
of the Universe means that distant superno- 
vae have already receded farther from us and 
look even fainter. Initial enthusiasm for using 
supernovae as cosmic distance indicators, and 
thus as a probe of the Universe's expansion, 
garnered vast allocations of time on ground- 
and space-based telescopes, and triggered the 
first plans for a dedicated, all-sky successor to 
the Hubble Space Telescope. Unfortunately, 
the explosions were later found to depend on 
the stars’ environment and ingredients, which 
evolve over cosmic time. Such effects can be 
parameterized only to a certain precision, and 
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the technique is falling out of fashion. 
Distances can also be determined from the 
focal lengths of gravitational lenses. Gravita- 
tional lensing is the deflection of light from 
distant galaxies when it passes through the 
warped space-time around foreground ‘lens- 
ing’ masses along our line of sight. Just as in 
conventional optics, the efficiency of light 
deflection depends on the distance to the 
lens and to the source. Faraway galaxies look 
slightly magnified, and their shapes are dis- 


Figure 1| You thought that light travels in straight lines? Not so in the 
curvy world of gravitational lensing, where new results'” disentangle 
the zoom from the fisheye. 


torted (Fig. 1). Characteristic patterns induced 
by lensing in the apparent shapes of distant 
galaxies were first observed in 2000, and were 
first used to constrain the properties of dark 
energy in 2006°. Measuring the subtle shape 
changes in distant galaxies requires a telescope 
with exceptional optics, which is possible only 
above Earth’s atmosphere. But the technique 
was initially hailed as perfectly clean, because 
the only underlying physics is Einstein’s well- 
understood theory of general relativity. 
Following the same product life cycle as 
supernova distances, further studies revealed 
several potential physical flaws. First, lens- 
ing measurements assume that galaxies’ true 
shapes are random, to infer that any observed 
patterns are produced entirely by light deflec- 
tion around the foreground mass. However, 
the tidal gravitational forces between adja- 
cent galaxies may elongate them towards each 
other, and one slightly in front may itself lens 
one slightly behind; both effects can mimic 
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the lensing signal and bias the measured 
distances. Second, lensing simultaneously 
distorts and enlarges galaxies, and the two 
effects cannot be measured independently with- 
out one biasing the other. Finally, the lensing 
magnification also makes a survey more likely 
to find highly distorted galaxies and ignore 
undistorted ones. 

The first of the aforementioned effects, 
known as the intrinsic-alignments problem, 
can be overcome by a three-dimensional analy- 
sis of the galaxy locations, in which the align- 
ment is measured from close pairs and then 
subtracted from the rest®. The second, called 
reduced shear, was solved’ by changing the the- 
ory to meet the data, diminishing the expected 
increase in the distortion signal at a given dis- 
tance behind a lens that also enlarges. Schmidt 
et al.” have now performed a similar feat with 
the final, ‘magnification bias’ problem. 

Schmidt and colleagues’ solution’ is a cru- 
cial advance for the technique of gravitational 
lensing, but is not without limitations. 
For it to work, theoretical calculations 
against which observations are com- 
pared must correctly predict complex 
statistics of the cosmic distribution 
of mass. Looking farther along some 
sight lines than others also mixes the 
cosmological signal with the method's 
built-in “B-mode’ control experiment. 
This had previously been used to 
check for potential imperfections in 
the telescope optics, so they now need 
to be even better. 

Distances can also be measured 
by one final technique. Ripples from 
sound waves generated in the early 
Universe left their imprint on relic 
radiation from the Big Bang — the 
cosmic microwave background — 
and also on structures at all cosmic 
epochs. In patterns known as baryon 
acoustic oscillations, galaxies visible 
today are preferentially separated 
from each other by a set physical distance — 
which depends on the size of the sound-wave 
ripples and reliably seems to be smaller the 
farther away the galaxies are. This technique 
was not even considered worth mentioning 
in research proposals in the mid-1990s, but 
emerged in 2005 as the most important result 
from two large galaxy surveys — the Two- 
degree-Field and Sloan Digital Sky Survey’. 
Larger ground-based telescopes are currently 
setting out to measure this effect, but seeds of 
doubt are already emerging about how faith- 
fully real galaxies trace the original ripples. 

As scientific fashions come and go, the 
rivalry between the three houses might be 
more at home on the catwalks of Paris or 
Milan. The techniques are at different stages 
of the same product cycle. Initial hype draws 
a flurry of excitement, but when system- 
atic physical flaws show up, sober reflection 
brings a sheepish look back at the design. 
Some methods may be consigned to a dusty 
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drawer. But the stitch or two of alterations by 
Schmidt and colleagues’” has ensured that 
gravitational lensing will still be on the hot list 
next season. a 
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ONDHORW 


MICROBIOLOGY 


Life on leaves 


Johan Leveau 


The surface of plant leaves — the phyllosphere — is home to many 
microbes. A ‘community proteogenomics’ approach offers a fresh 
look at what it takes to survive and thrive in this unique habitat. 


Under the microscope, aerial plant leaves 
resemble eerie landscapes, with deep gorges, 
tall peaks and gaping pits that riddle the waxy 
surface. Add to this scenery a climate that 
features temperature highs of 50°C or more, 
exposure to harmful ultraviolet rays, erratic 
periods of drought and limited access to nutri- 
ents, and one gets the picture that this is a hos- 
tile environment. Still, many bacteria, fungi, 
yeast and other microorganisms dwell in great 
abundance in this ‘phyllosphere’’, which is the 
subject of a new investigation by Delmotte and 
colleagues’. In their paper, published in Pro- 
ceedings of the National Academy of Sciences, 
they bring twenty-first-century tools to bear 
on the phyllosphere, with special reference to 
bacteria. 

Much is known about microbial adaptations 
to the leaf surface — for instance the produc- 
tion of pigments to avoid DNA damage from 
solar radiation or the accumulation of compat- 
ible solutes to deal with water stress. However, 
most of this knowledge has been inferred from 
single microbial species, from cultivating rep- 
resentative isolates in the laboratory and from 
exposing isolates artificially to plant foliage for 
an assessment of which genes contribute to 
microbial fitness in the phyllosphere. Delmotte 
and colleagues’ investigation’ is an exercise in 
‘community proteogenomics. This approach 
does not rely on cultivation, does not focus ona 
single species, and does not suffer from the con- 
trolled conditions that typify lab experiments. 
The result is a snapshot-like, culture-inde- 
pendent insight into the diverse mechanisms 
that underlie the success of leaf-surface micro- 
bial colonists — in this case bacteria, the most 
abundant of the colonists at estimated densities 
of 10°10’ cells per square centimetre (ref. 3). 

Community proteogenomics’ arose from the 
marriage between metagenomics and meta- 
proteomics. Metagenomics involves analysis 
of the mix of all microbial DNA present ina 
particular environmental sample, whereas 
metaproteomics does the same for all proteins. 


The metaproteomic portion of Delmotte and 
colleagues’ approach involved collection of 
microbial biomass from leaf surfaces, protein 
extraction and digestion, separation of the frag- 
ments by liquid chromatography and analysis 
by mass spectrometry. The result was a mixed 
bag of nearly half a million spectra, each corre- 
sponding toa short peptide sequence. Linking 
these spectra to proteins with a possible func- 
tion and evolutionary origin is a challenge and 
is possible only with a proper frame of refer- 
ence. Typically, this frame is provided by the 
publicly available databases of annotated DNA 
and protein sequences. 

However, if a microbial community has 
few representatives in the public database, the 
chances are that many of the sequences in the 
database will be too dissimilar to allow positive 
matching with short peptide sequences from 
the environmental proteome. This is where 
the metagenomic part of the proteogenomic 
approach comes in: it increases the probability 
of protein identification by metagenomic profil- 
ing of the same sample from which the proteins 
were extracted. In the case of Delmotte et al.’, 
pyrosequencing was used to construct a repre- 
sentative library of DNA sequences from the 
leaf samples: by including these metagenomic 
data on top of the sequences in the public data- 
base, up to 87% more proteins could be iden- 
tified in the bacterial leaf communities. This 
suggests that many bacteria from the leaves of 
the plants that were investigated — soya bean, 
clover and Arabidopsis — are indeed genetically 
distinct from the bacteria for which genomic 
data are currently available. This was especially 
true for members of the genus Sphingomonas, 
which were among the most numerous bacteria 
present. Were it not for the metagenomic data, 
none of the abundant proteins assigned to this 
genus would have been identified. 

The phyllosphere metaproteogenome 
reveals that many of the highly expressed bac- 
terial proteins — porins, TonB-like proteins 
and components of ABC-type transporters, 
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for example — are apparently involved in 
scavenging what little food there is available 
on the leaf surface. This possibility is consist- 
ent with studies showing the limited access to 
nutrients in the phyllosphere, such as the prod- 
ucts of photosynthesis that leak from the leaf 
interior’. Proteins for using methanol, a plant 
waste product, were also abundant and could 
be assigned to Methylobacterium species — leaf 
colonizers of many different plants®. Stress pro- 
teins were over-represented as well, revealing a 
need to protect the bacterial cells from oxida- 
tive and osmotic damage, and to prevent them 
from becoming desiccated. One of the sur- 
prising finds was the prominence of a protein 
containing a fasciclin domain, possibly involved 
in cell adhesion, but with no previously 
suspected role in survival in the phyllosphere. 
The wider context for this line of research is 
illustrated by considering the significance of 
microbial populations on leaves. They play a 
part in the global nitrogen and carbon cycles; 
they participate in removing airborne pollut- 
ants’; and they contribute to the decomposi- 
tion of leaf litter and to the production of plant 
and animal feed by composting and silaging. 
The traditional focus of phyllosphere research 
has been on microorganisms that are of 
agricultural relevance, in particular plant path- 
ogens and their antagonists. But the discovery 
of archetypal leaf bacteria such as the plant- 
pathogenic Pseudomonas syringae in non- 
agricultural environments’, and the detection 
of human enteropathogens such as Escherichia 
coli 0157:H7 on leaf surfaces’, are inviting a 
more expansive view of the phyllosphere as a 
source and sink of environmental bacteria. 
One value of the study by Delmotte et al.” 
is that it will help to draw this microbial habi- 
tat to the attention of a broader audience of 
researchers and into the field of compara- 
tive ‘-omics. It will also serve as a baseline for 
further proteogenomic excursions into the 
phyllosphere, which are likely to involve stud- 
ies at higher resolution, both temporally and 
spatially. Among the issues to be addressed are 
the dynamics of microbial protein expression 
relative to changes in community composition, 
and the role of the plant and its environment 
in driving the functional plasticity of foliage- 
associated microorganisms. a 
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STRUCTURAL BIOLOGY 


Tracing Argonaute binding 


Samir Bouasker and Martin J. Simard 


Argonaute proteins inhibit gene expression by binding to messenger RNA 
via a small nucleic-acid guide. Structures of the Argonaute complex bound 
to target RNA reveal snapshots of a silencing machine at work. 


Argonaute proteins are essential regulators 
of gene expression, being key components of 
gene-silencing pathways mediated by RNA. 
In plants and animals, Argonaute proteins 
interact with small RNA molecules to form an 
enzymatically active complex called RISC. The 
RISC complex silences target genes by binding 
to messenger RNA that has sequence comple- 
mentarity to the Argonaute-bound RNA. Some 
Argonaute proteins cleave mRNA; others have 
lost their catalytic activity and regulate gene 
expression by some other means, most likely 
by inhibiting mRNA translation’. 

Although previous biochemical studies have 
revealed much about the molecular mecha- 
nisms of RNA silencing, exactly how Argonaute 
proteins interact with small RNA molecules 
as well as with their target mRNAs has been 
unclear. The paper by Wang and colleagues” 
from the Patel and Tuschl labs (page 754 of this 
issue) is therefore especially welcome — the 
authors report the results of a series of impres- 
sive structural studies that reveal the molecular 
dynamics of the Argonaute protein as it binds 
to, and slices up, its RNA target. 

Argonaute proteins have four domains: the 
amino-terminal, PAZ, MID and PIWI domains 
(Fig. 1). The PAZ domain is implicated in the 
binding of single-stranded RNA, whereas the 
PIWI domain has endonucleolytic activity 
— the ability to cleave nucleic acid at internal 
bonds. The first crystal structures of Argo- 
naute proteins from prokaryotes*” (bacteria 
or archaea) revealed that the PIWI domain is 
structurally similar to members of the RNAse H 


a b 


Guide nucleic-acid 
strand 


MID 


family of endonucleases, which use DNA asa 
guiding template to target RNA molecules. 
Although prokaryotic Argonaute proteins use 
DNA as the nucleic-acid template strand, plant 
and animal Argonaute proteins have evolved to 
use single-stranded RNA, rather than DNA, as 
a template to target RNA. 

The Patel and Tuschl labs recently reported® 
the structure of a binary complex of bacterial 
Argonaute bound to a 5’-phosphorylated DNA 
guide strand. In this work, they showed that 
both ends of the 21-nucleotide DNA guide 
strand are anchored within the Argonaute 
protein — the 5’-phosphate end is anchored 
in a binding pocket in the MID domain, and 
two nucleotides at the extreme 3’-hydroxyl 
end of the guide DNA strand are anchored 
in a pocket of the PAZ domain. The same 
two groups described’ the first ternary com- 
plex of bacterial Argonaute, consisting of 
the Argonaute protein bound to guide DNA 
plus a 20-nucleotide target RNA. To solve this 
ternary structure, Patel, Tuschl and colleagues 
had to prevent Argonaute from cleaving the 
target RNA; they did this by introducing base 
mismatches between the guide DNA and the 
target RNA. However, this manoeuvre made it 
impossible to assess guide DNA-target RNA 
base-pairing at and beyond the mispaired 
cleavage site. So although this study’ showed 
that the Argonaute complex undergoes confor- 
mational changes when it binds to target RNA, 
the researchers could not clearly observe the 
behaviour of Argonaute during the nucleation 
and propagation of the guide DNA-target RNA 


Nucleation 
3’ OH 


MID 


¢ Propagation and 3’-end release d 


MID 


complex or during cleavage of target RNA. 

In their latest study, the authors’ circum- 
vent this problem by using a catalytic mutant 
of bacterial Argonaute, which is unable to slice 
up target RNA. The authors analysed several 
structures of the mutant Argonaute protein 
bound toa guide DNA that is fully base-paired 
to target RNA molecules of different lengths. 
As previously observed, both ends of the guide 
strand are anchored in the Argonaute binding 
pockets (Fig. 1a). 

The nucleation step begins with the binding 
of the guide DNA to target RNA at the DNA 
5'-phosphate end (Fig. 1b). The guide DNA 
and target RNA form base pairs and “zipper 
up; forming a DNA-RNA double helix that 
extends from the 5’ end of the guide DNA to 
DNA nucleotide 16. Pivoting of the Argonaute 
protein allows double-helix formation while 
both ends of the guide DNA are anchored in the 
Argonaute binding sites. Beyond nucleotide 16, 
Argonaute’s N-terminal domain blocks addi- 
tional base-pairing towards the 3’-hydroxyl 
end of the guide strand. The propagation of 
base-pairing between the guide DNA and the 
target RNA before this obstruction results in 
the release of the 3’ end of the DNA from its 
anchor site in the PAZ domain (Fig. 1c). This 
release allows rotation of the PAZ domain, 
leading to a conformational change that 
favours the positioning of the cleavage site of 
target RNA close to the catalytic residues in the 
PIWI domain (Fig. 1d). 

The structures solved by Wang et al.’ also 
clearly show that two magnesium ions that are 
essential for cleavage activity are located one 
on either side of the cleavage site. This require- 
ment for cations to facilitate site-specific cleav- 
age is also a feature of RNAse H endonucleases, 
confirming that Argonaute cleavage activity is 
highly similar to that of RNAse H enzymes. 

Although studies of Argonaute proteins 
in prokaryotes are informative, prokaryotic 
Argonaute has not been implicated in small 
RNA-mediated silencing pathways. The next 
big challenge will be to solve the structures of 


Cleavage 


Binding 
pocket 


Figure 1| The Argonaute silencing complex at work. a, Argonaute proteins have four domains: the amino-terminal domain (N), PAZ, MID and PIWI. Each 
Argonaute protein binds to a small nucleic-acid molecule (red; RNA in plants and animals, and DNA in bacteria), which functions as a template for binding 
to complementary target RNA. The 5’-phosphate (5’ P) end of the guide nucleic acid is anchored in the MID domain, and the 3’-hydroxyl end (3’ OH) is 
anchored in the PAZ domain. b, Structural studies by Wang and colleagues’ reveal that when the Argonaute complex binds to target RNA, the nucleation step 
begins with formation of a double helix by base pairing between the guide nucleic acid and the target RNA, commencing at the 5’-phosphate end of the guide 
strand. ¢, Pivotal movement of the Argonaute protein allows extension of the double helix while the guide DNA is anchored at both ends. The 3’-hydroxyl 
end of the guide strand is then released from the PAZ domain, allowing its rotation. d, This conformational change favours the exact positioning of the target 
RNA cleavage site close to the Argonaute PIWI domain. Magnesium ions in the PIWI domain facilitate precise cleavage of the target. 
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Argonaute proteins in plants and animals. It 
will be interesting to know whether plant and 
animal Argonaute proteins promote nucleation 
of all of the guide strand’s nucleotides with the 
RNA target to increase binding and silencing 
specificity, or whether they nucleate only up to 
position 16, like bacterial Argonaute. Another 
question is whether interaction between ani- 
mal microRNA (a type of small RNA encoded 
in the genome that is used as a guide strand) 
and target mRNA can be accommodated by the 
Argonaute protein, because microRNA typically 
binds imprecisely to target mRNA and forms an 
imperfectly paired RNA double helix. Compar- 
ing the structural features of Argonaute proteins 
from different organisms will help us to further 


understand their functions within the RNA 
silencing pathways and might even uncover 
new roles for this versatile protein family. m™ 
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PHOTONICS 


One-way road for light 


Eli Yablonovitch 


The transmission of information from one place to another by light waves 
sent through waveguides is hampered by light attenuation and scattering 
loss. Magnetic photonic crystals could provide a solution to such problems. 


The concept of photonic crystals — periodically 
arranged structures specifically engineered to 
trap and guide light — grew from an initial 
analogy’ with the electronic band structure 
of semiconductors. In these materials, no elec- 
trons can be found that have energies within 
a range known as the ‘band gap. Similarly, in 
photonic crystals, photons whose frequen- 
cies fall within the ‘photonic band gap’ are 
prevented from flowing inside the material. 

Haldane and Raghu’ have recently extended 
the equivalence between the behaviour of 
photons in photonic crystals and that of 
electrons in electronic systems. They have 
predicted the photonic analogue of the ‘edge’ 
states that characterize the quantum Hall 
effect* that is experienced by the electrons of 
a two-dimensional (2D) electron gas when it 
is subjected to a strong magnetic field. Under 
certain conditions, photons can be confined 
to the edges of a 2D photonic crystal — one 
whose lattice structure has 2D periodicity — 
and be restricted to unidirectional propaga- 
tion. On page 772 of this issue, Wang et al.° 
report observing such photonic edge states 
in a magneto-optical 2D photonic crystal, 
verifying Haldane and Raghu’s theoretical 
prediction’. 

To achieve unidirectional photonic edge 
states requires a system that lacks time-reversal 
symmetry — that is, one with physical proper- 
ties that are not preserved by a time-reversal 
transformation. To realize such a ‘non-recip- 
rocal’ system, Wang and colleagues® used a 
photonic crystal consisting of a 2D-periodic 
arrangement of magneto-optical ferrite 
rods; the magneto-optical nature of the rods 


744 


confers the desired time-reversal asymmetry 
on the system. After characterizing the sys- 
tem’s band gap, the authors demonstrated the 
unidirectional character of the system's edge 
states: forward-propagating transmission out- 
weighed backward-propagating transmission 
by almost 50 decibels. 

Wang and colleagues’ experimental dem- 
onstration® of the correspondence between 
the optics of a photonic crystal and the ele- 
gant physics of the quantum Hall effect is not 
only a delight for fundamental science, it also 
opens the door to practical applications 
based on non-reciprocal photonic crystals. 
These crystals may provide the means to 
develop a new type of optical-fibre waveguide 
that would be utterly immune to energy loss 
caused by scattering from material defects or 
obstacles. 

Photonic-crystal fibres’, a form of optical 
fibre based on 2D photonic crystals, have been 
very successful in providing unique functions® 
in fibre-optic communications. The most 
interesting type of photonic-crystal fibre has 
a hollow core in which light is confined by a 
surrounding cladding that consists of either 
a 2D-periodic photonic crystal or concentric 
‘Bragg rings”. Because their cores are hollow 
rather than being filled with a material sub- 
stance, light channelling through them suf- 
fers less absorption loss, enabling low-loss 
propagation over long distances. Indeed, it has 
been shown” that photonic-crystal fibres can 
achieve very low loss. But they are not quite as 
lossless as one would hope owing to scatter- 
ing caused by the intrinsic roughness of their 
internal (glass) cladding surfaces’®: the lowest 
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reported loss is still about a factor of ten larger 
than that of their best conventional counter- 
parts. In transoceanic optical-fibre systems, 
underwater amplifiers must be placed approxi- 
mately every 100 kilometres to compensate 
for loss. 

One of the distinctive properties of 3D- 
periodic photonic crystals is that light is con- 
fined in all directions. As a consequence, and 
unlike in ordinary fibres or 2D-periodic pho- 
tonic-crystal fibres, light travelling through 
hollow waveguides carved out of 3D-periodic 
photonic crystals is not subject to scattering 
loss. This increases the possibility of attaining 
ultra-low-loss light propagation, with both 
absorption and scattering losses suppressed. 
However, it does not prevent back scattering. 
Light can propagate both forwards and back- 
wards within the same hollow waveguide, 
and back scattering off an obstacle within the 
waveguide can reduce forward transmission, 
and so bea source of loss even in a 3D-periodic 
photonic crystal. 

Haldane and Raghu’s theoretical ideas’, 
together with the experiments of Wang et al.°, 
now offer a solution to the back-scattering 
problem. By using a magneto-optical, phot- 
onic-crystal system that breaks time-reversal 
symmetry, Wang and colleagues show that 
it is possible to design the material’s disper- 
sion relationship, which describes the way in 
which wave propagation varies with frequency, 
such that, for a given frequency band, only 
forward-propagating waves exist. The ferrites 
the authors® used operate at microwave, rather 
than optical, frequencies. Nonetheless, there 
are several other magneto-optical materials 
that are used in the optical regime. 

To sum up, the ideal optical waveguide 
would be made of a low-loss hollow core, with 
a layer of non-reciprocal material, surrounded 
by a 3D-periodic photonic crystal, providing 
immunity to both back scattering and surface- 
roughness scattering. With such a low-loss 
waveguide, the possibility would then exist for 
one-hop transoceanic communication across 
10,000 kilometres — about the distance from 
San Francisco to Tokyo — without the cur- 
rent requirement for electronic repeaters or 
amplifiers. a 
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REVIEWS 


Finding the missing heritability of complex 
diseases 


Teri A. Manolio!, Francis S. Collins”, Nancy J. Cox®, David B. Goldstein*, Lucia A. Hindorff?, David J. Hunter®, 
Mark |. McCarthy’, Erin M. Ramos”, Lon R. Cardon®, Aravinda Chakravarti’, Judy H. Cho’®, Alan E. Guttmacher’, 
Augustine Kong'', Leonid Kruglyak’*, Elaine Mardis'’, Charles N. Rotimi'*, Montgomery Slatkin’’, David Valle’, 
Alice S. Whittemore’®, Michael Boehnke’’”, Andrew G. Clark’®, Evan E. Eichler’’, Greg Gibson’’, Jonathan L. Haines”’, 
Trudy F. C. Mackay””, Steven A. McCarroll?’ & Peter M. Visscher”* 


Genome-wide association studies have identified hundreds of genetic variants associated with complex human diseases and 
traits, and have provided valuable insights into their genetic architecture. Most variants identified so far confer relatively 
small increments in risk, and explain only a small proportion of familial clustering, leading many to question how the 
remaining, ‘missing’ heritability can be explained. Here we examine potential sources of missing heritability and propose 
research strategies, including and extending beyond current genome-wide association approaches, to illuminate the genetics 


of complex diseases and enhance its potential to enable effective disease prevention or treatment. 


any common human diseases and traits are known to 

cluster in families and are believed to be influenced by 

several genetic and environmental factors, but until 

recently the identification of genetic variants contributing 
to these “complex diseases’ has been slow and arduous’. Genome-wide 
association studies (GWAS), in which several hundred thousand to 
more than a million single nucleotide polymorphisms (SNPs) are 
assayed in thousands of individuals, represent a powerful new tool for 
investigating the genetic architecture of complex diseases”. In the past 
few years, these studies have identified hundreds of genetic variants 
associated with such conditions and have provided valuable insights 
into the complexities of their genetic architecture**. 

The genome-wide association (GWA) method represents an 
important advance compared to ‘candidate gene’ studies, in which 
sample sizes are generally smaller and the variants assayed are limited 
to a selected few, often on the basis of imperfect understanding of 
biological pathways and often yielding associations that are difficult 
to replicate’®. GWAS are also an important step beyond family-based 
linkage studies, in which inheritance patterns are related to several 
hundreds to thousands of genomic markers. Despite many clear 
successes in single-gene ‘Mendelian’ disorders”*, the limited success 
of linkage studies in complex diseases has been attributed to their low 
power and resolution for variants of modest effect””’. 


The underlying rationale for GWAS is the “common disease, 
common variant’ hypothesis, positing that common diseases are 
attributable in part to allelic variants present in more than 1-5% of 
the population'?"*. They have been facilitated by the development of 
commercial “SNP chips’ or arrays that capture most, although not all, 
common variation in the genome. Although the allelic architecture of 
some conditions, notably age-related macular degeneration, for the 
most part reflects the contributions of several variants of large effect 
(defined loosely here as those increasing disease risk by twofold or 
more), most common variants individually or in combination confer 
relatively small increments in risk (1.1—-1.5-fold) and explain only a 
small proportion of heritability—the portion of phenotypic variance 
in a population attributable to additive genetic factors*. For example, 
at least 40 loci have been associated with human height, a classic 
complex trait with an estimated heritability of about 80%, yet they 
explain only about 5% of phenotypic variance despite studies of tens 
of thousands of people'’*. Although disease-associated variants occur 
more frequently in protein-coding regions than expected from their 
representation on genotyping arrays, in which over-representation of 
common and functional variants may introduce analytical biases, the 
vast majority (>80%) of associated variants fall outside coding 
regions, emphasizing the importance of including both coding and 
non-coding regions in the search for disease-associated variants’. 
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The questions arise as to why so much of the heritability is apparently 
unexplained by initial GWA findings, and why it is important. It is 
important because a substantial proportion of individual differences 
in disease susceptibility is known to be due to genetic factors, and 
understanding this genetic variation may contribute to better preven- 
tion, diagnosis and treatment of disease. It is important to recognize, 
however, that few investigators expected these studies immediately to 
find all of the variants associated with common diseases, or even most of 
them; the hope was that they would at least find some’®. Limitations in 
the design of early GWAS, such as imprecise phenotyping and the use of 
control groups of questionable comparability, may have reduced esti- 
mates of effect sizes while preserving some ability to identify associated 
variants'’. These studies have considerably surpassed early expectations, 
reproducibly identifying hundreds of variants in many dozens of traits, 
but for many traits they have explained only a small proportion of 
estimated heritability’®. 

Many explanations for this missing heritability have been sug- 
gested, including much larger numbers of variants of smaller effect 
yet to be found; rarer variants (possibly with larger effects) that are 
poorly detected by available genotyping arrays that focus on variants 
present in 5% or more of the population; structural variants poorly 
captured by existing arrays; low power to detect gene—gene interac- 
tions; and inadequate accounting for shared environment among 
relatives. Consensus is lacking, however, on approaches and priorit- 
ies for research to examine what has been termed ‘dark matter’ of 
genome-wide association—dark matter in the sense that one is sure it 
exists, can detect its influence, but simply cannot ‘see’ it (yet). Here 
we examine potential sources of missing heritability and propose 
research strategies to illuminate the genetics of complex diseases. 


Heritability and allelic architecture of complex traits 


It is reasonable to assume that allelic architecture (number, type, effect 
size and frequency of susceptibility variants) may differ across traits, 
and that missing heritability may take a different form for different 
diseases'’, but at present our understanding is too limited to distin- 
guish these possibilities. Age-related macular degeneration may pro- 
vide the best example of a common disease in which heritability is 
substantially explained by a small number of common variants of large 
effect”®, but for other conditions, such as Crohn’s disease, the propor- 
tion of heritability explained is not nearly so large despite a much 
larger number of identified variants*' (Table 1). There are no obvious 
differences between these two traits in genetic architecture as pre- 
dicted from clinical and epidemiological data that would explain 
the differences observed in their allelic architecture. Some apparent 
differences may simply be due to differences in the stage of investiga- 
tion across traits. Studies in several conditions have clearly demon- 
strated that the number of detected variants increases with increasing 
sample size***. 

Population genetic theory suggests an explanation for the paucity 
of variants explaining a large proportion of disease predisposition, in 
that decreased reproductive fitness should typically act to reduce the 
frequencies of high-risk variants. This might explain the relative lack 
of variants detected so far for some neuropsychiatric conditions, such 
as autism spectrum disorders, given their low reproductive fitness”. 
Yet for a condition such as type 1 diabetes, which has a similar pre- 
valence, familial risk, early onset and poor reproductive fitness (at 
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least before the discovery of insulin therapy), more than 40 loci have 
already been reported; this might be because the overall sample sizes 
studied in type 1 diabetes have been very large*®”’. Present-day repro- 
ductive fitness may correlate poorly with the forces that have shaped 
variation throughout human evolution; moreover focusing on the 
reproductive effects of a single disease ignores the pleiotropic effects 
(effects of the same variant on multiple characteristics or disease 
risks) of multiple alleles influencing that condition simultaneously 
with many other conditions”. 

Selection might also be responsible for keeping genetic effect sizes 
low, as variants of larger effect may be selected against and eventually 
disappear’. Long-term stabilizing selection minimizes the produc- 
tion of individuals at the extremes of a trait”, in part by reducing the 
additive genetic effects of alleles already present or those arising de 
novo by mutation” to levels potentially beneath the ability of studies 
of feasible size to detect them. Selection may also contribute to dif- 
ferences in the ability to detect loci in different complex diseases, if 
genetic susceptibility to some diseases is more strongly affected by 
selection than other diseases, or if environmental perturbations vary 
in intensity across diseases. Immune and infectious agents have been 
recognized as among the strongest selection pressures in human 
evolution*', and immune-related genes have been strongly impli- 
cated in Crohn’s disease and other immune-mediated diseases’, sug- 
gesting either that pleiotropic effects of these variants reduce the 
efficiency of negative selection, or that strong environmental per- 
turbation in modern societies might expose the disease risk asso- 
ciated with these variants. Selection may thus explain why disease 
allele frequencies are low and allelic effects are small, but this should 
manifest as low, rather than missing, heritability. 

A probable contributor to the small genetic effect sizes observed so 
far is that current investigations have incompletely surveyed the 
potential causal variants within each gene. Relative risks observed 
for marker SNPs may underestimate the actual risks associated with 
the true causal variants. Notably, 11 out of 30 genes implicated as 
carrying common variants associated with lipid levels also carry 
known rare alleles of large effect identified in Mendelian dyslipide- 
mias, including ABCA1, PCSK9 and LDLR**”’, suggesting that genes 
containing common variants with modest effects on complex traits 
may also contain rare variants with larger effects. 

An important consideration is that the overwhelming majority of 
GWAS and other genetic studies have been limited to European 
ancestry populations, whereas genetic variation is greatest in popula- 
tions of recent African ancestry’, and studies in non-Europeans have 
yielded intriguing new variants****. Studies of populations of recent 
African ancestry in particular is likely to increase the yield of rare 
variants and narrow the large chromosomal regions of association 
identified in the ‘younger’ population due to extended linkage dis- 
equilibrium, or the tendency for adjacent genetic loci to be inherited 
together’. Isolated populations may also be of value given their 
potential to be enriched in unique variants”. 

The accuracy of current heritability estimates is also important, 
because experimentally identified variants could never explain all the 
variance in an erroneously inflated heritability estimate. Heritability 
of quantitative traits, formally defined as the proportion of pheno- 
typic variance in a population attributable to additive genetic factors 
(narrow-sense heritability, h* (ref. 36)) is typically estimated from 


Table 1| Estimates of heritability and number of loci for several complex traits 


Disease Number of loci Proportion of heritability explained Heritability measure 

Age-related macular degeneration”? 5 50% Sibling recurrence risk 
Crohn's disease*+ 22 20% Genetic risk (liability) 
Systemic lupus erythematosus”? 6 15% Sibling recurrence risk 

Type 2 diabetes” 18 6% Sibling recurrence risk 

HDL cholesterol’® 7 5.2% Residual* phenotypic variance 
Height*® 40 5% Phenotypic variance 

Early onset myocardial infarction’® 9 2.8% Phenotypic variance 

Fasting glucose’” 4 1.5% Phenotypic variance 


* Residual is after adjustment for age, gender, diabetes. 
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family studies, and can be expected to vary across environments. 
Narrow-sense heritability estimates in humans can be inflated if 
family resemblance is influenced by non-additive genetic effects 
(dominance and epistasis, or gene—gene interaction), shared familial 
environments, and by correlations or interactions among genotypes 
and environment**’’. However, heritabilities estimated from pedi- 
gree studies in animals agree well with heritability estimated from 
response to artificial selection, suggesting that estimates from family 
studies are not necessarily inflated. 

Teasing apart the contributions to heritability of environmental 
factors shared among relatives will soon be possible because the 
availability of genome-wide markers now provides empirical esti- 
mates of identity-by-descent (IBD) allele sharing between pairs of rela- 
tives. For example, full sibs share on average half their genetic com- 
plement, but this proportion can vary—in one large study it ranged 
from 0.37 to 0.62 (ref. 38). By relating phenotypic differences to the 
observed IBD sharing fraction among sib pairs, marker data were used 
to generate a heritability estimate of 0.8 for height**. This is remarkably 
consistent with estimates using traditional methods but free of their 
assumptions, suggesting that for height at least, heritability is not over- 
estimated. Applying such estimation to distantly related or ‘unrelated’ 
individuals is now feasible using dense genomic scans”; given the num- 
ber of people with dense genotyping data, heritability estimates could be 
generated for a wide variety of traits free of potential confounding by 
unmeasured shared environment. 

Improving estimates of all contributors to heritability will facilitate 
determination of the proportion of genetic variance that has been 
explained. Despite imprecision in current estimates, it may still be 
possible to know that “all the heritability’ has been explained by pre- 
dicting phenotypes in a new set of individuals from trait-associated 
markers, and correlating the predicted phenotypes with the actual 
values. If the markers truly explain all the additive genetic variance, 
the squared correlation between predicted and actual phenotype will 
be equal to the heritability*’. Population-based heritability estimates 
thus provide a valuable metric for completeness of available genetic 
risk information, but individualized disease prevention and treatment 
will ultimately require identifying the variants accounting for risk in a 
given individual rather than on a population basis. 


Rare variants and unexplained heritability 


Much of the speculation about missing heritability from GWAS has 
focused on the possible contribution of variants of low minor allele 
frequency (MAF), defined here as roughly 0.5% < MAF < 5%, or of 
rare variants (MAF < 0.5%). Such variants are not sufficiently fre- 
quent to be captured by current GWA genotyping arrays’**', nor do 
they carry sufficiently large effect sizes to be detected by classical 
linkage analysis in family studies (Fig. 1). Once MAF falls below 
0.5%, detection of associations becomes unlikely unless effect sizes 
are very large, as in monogenic conditions. For modest effect sizes, 
association testing may require composite tests of overall ‘mutational 
load’, comparing frequencies of mutations of potentially similar 
functional effect in cases and controls. 

Low frequency variants could have substantial effect sizes (increas- 
ing disease risk two- to threefold) without demonstrating clear 
Mendelian segregation, and could contribute substantially to missing 
heritability’. For example, 20 variants with risk allele frequency of 1% 
and allelic odds ratio (or probability of an event occurring divided by 
the probability of it not occurring, compared in people with versus 
without the risk allele) of three would account for most familial 
aggregation of type 2 diabetes. There are relatively few examples of 
such variants contributing to complex traits, possibly owing to insuf- 
ficiently large sample sizes or insufficiently comprehensive arrays. 

The primary technology for the detection of rare SNPs is sequen- 
cing, which may target regions of interest, or may examine the whole 
genome. ‘Next-generation’ sequencing technologies, which process 
millions of sequence reads in parallel, provide monumental increases 
in speed and volume of generated data free of the cloning biases and 
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Figure 1 | Feasibility of identifying genetic variants by risk allele frequency 
and strength of genetic effect (odds ratio). Most emphasis and interest lies 
in identifying associations with characteristics shown within diagonal dotted 
lines. Adapted from ref. 42. 


arduous sample preparation characteristic of capillary sequencing”. 
Detection of associations with low frequency and rare variants will be 
facilitated by the comprehensive catalogue of variants with 
MAF = 1% being generated by the 1,000 Genomes Project (http:// 
www. 1000genomes.org/page.php), which will also identify many 
variants at lower allele frequencies. The pilot effort of that program 
has already identified more than 11 million new SNPs in initially low- 
depth coverage of 172 individuals”. 

Current mechanisms for using sequencing to identify rare variants 
underlying or co-located with GWA-defined associations include 
sequencing in genomic regions defined by strong and repeatedly repli- 
cated associations with common variants, and sequencing a larger frac- 
tion of the genome in people with extreme phenotypes. In the absence 
of GWA-defined signals, sequencing candidate genes in subjects at the 
extremes of a quantitative trait (such as lipid levels or the age at onset), 
can identify other associated variants, both common and rare***. An 
important finding from these studies is that much of the information is 
provided by people at the extremes of trait distributions, who seem to be 
mote likely to carry loss-of-function alleles*”. 

Sample sizes used for the initial identification of DNA sequence 
variants have generally been modest, and sample size requirements 
increase essentially linearly with 1/MAF. Much larger samples are 
needed for the identification of associations with variants than those 
needed for the detection of the variants themselves. They also scale 
roughly linearly with 1/MAF given a fixed odds ratio and fixed degree 
of linkage disequilibrium with genotyped markers. Sample size for 
association detection also scales approximately quadratically with 
1/|(OR — 1)|, and thus increases sharply as the odds ratio (OR) 
declines. Sample size is even more strongly affected by small odds 
ratios than by small MAF, so low frequency and rare variants will 
need to have higher odds ratios to be detected. 

Complicating matters further, numerous rare variants may be 
detected in a gene or region but they may have disparate effects on 
phenotype. Common variants have typically been analysed individu- 
ally****, but with one or two carriers of each rare variant, pooling 
them using specific criteria becomes attractive*”*’”°. Pooling variants 
of similar class increases the effective MAF of the class and reduces the 
number of tests performed, but raises several other questions (Box 1). 

Determining which of the multitude of variants carried by an 
individual are responsible for a given phenotype represents a massive 
task, especially if the causal alleles are relatively anonymous in terms 
of known functional consequences. Because only a small proportion 
will have obvious functional consequences for the resultant protein, 
lesser evidence of association may suffice to implicate variants of this 
sort. The best approaches for combining functional credibility and 
statistical support in the evaluation of such variants remain to be 
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Box 1| Research strategies using rare and low frequency variants 


and structural variants 


Research strategies using rare and low frequency and structural 
variants include: (1) using expanding catalogues of human sequence 
variation’*, by linkage disequilibrium of rare/low frequency/structural 
variants with GWA-genotyped SNPs and/or improved detection 
methods, to identify variants underlying association signals identified 
by SNP arrays. (2) Improving approaches for using common SNPs to 
predict and control for differences in rare and low frequency SNPs. (3) 
Using targeted sequencing judiciously, focusing on people with 
extreme or unusual phenotypes. (4) Including populations of recent 
African ancestry in sequencing studies to increase yield of rare variants 
and narrow large linkage disequilibrium blocks; consider isolated or 
founder populations potentially enriched with unique variants. (5) 
Focusing discovery efforts on well-phenotyped groups, accessible 
families with large sibships, and families that allow return to family 
members for iterative phenotyping. (6) Increasing emphasis on other 
structural variants such as inversions and translocations. (7) 
Implementing chromosomal-region-specific matching throughout the 
genome, to select for each case and for each part of their genome—a 
control that is more similar to the case within that genomic region 
rather than matching genome-wide using measures such as 
geographic ancestry. (8) Pooling rare variants for analysis using logical 
criteria, by addressing the questions: do the different rare variants 
increase or decrease disease risk? What classes of variants should be 
pooled? What is the optimal level of MAF for pooling? (9) Improving 
CNV detection by developing more extensive population databases in 
large cohorts to understand allele and mutation frequency, inheritance 
among unaffected individuals, and CNV calling algorithms. 


determined. GWAS have tended to focus almost exclusively on stati- 
stical evidence and de-emphasize considerations of biological plausi- 
bility, but the challenges of sifting through the millions of rare 
variants in which two individuals differ may prompt a return to 
biology if rare variants are to be grouped and analysed properly. 

The sheer number of inter-individual differences, mostly rare, to be 
detected by whole-genome sequencing (roughly 0.4% of 3 billion base 
pairs*') also raises the question of finding appropriate comparison 
subjects, or allelic matches, because people carrying rare variants at 
some loci may have important differences in ancestry or other factors 
from a general population. To reduce the number of variants that 
must be considered in a case-control comparison it would be useful 
to implement chromosomal-region-specific matching throughout 
the genome, to select closely related alleles and regions from the 
comparison population, thereby greatly reducing the number of 
incidental allelic differences from cases. 


Structural variation and unexplained heritability 


Structural variation, including copy number variants (CNVs, such as 
insertions and deletions) and copy neutral variation (such as inver- 
sions and translocations), may account for some of the unexplained 


Table 2 | Selected disease associations with rare CNVs and common CNPs 
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heritability if those variants contribute to the genetic basis of human 
disease and are incompletely assessed by commercial SNP genotyping 
arrays. Although this type of variation has not been explicitly examined 
in most GWAS until now, CNVs in particular (regions 1 kilobase (kb) 
or longer present in variable numbers across individuals) have gained 
attention as methods to detect them have improved***’. Other forms 
of structural variation such as inversions, translocations, microsatellite 
repeat expansions, insertions of new sequence, and complex rearran- 
gements have been implicated in rare Mendelian conditions. For the 
most part such variation has been largely unexplored in relation to 
complex traits”. 

Variation due to CNVs arises from a combination of rare and 
common alleles; as with SNPs most variants are rare but most of 
the differences between any two individuals arise from a limited set 
of common (MAF = 5%) copy number polymorphisms (CNPs)”. 
Disease-associated CNVs detected so far, like disease-associated 
SNPs, include rare variants with large associated effect sizes, and 
common variants with more modest effects but carried by a large 
proportion of the population (Table 2). An added twist is that rare, 
highly penetrant CNVs have generally been large (600 kb—3 mega- 
bases (Mb), affecting many genes), whereas disease-associated com- 
mon CNPs have been much smaller (20-45 kb) and have identified 
specific genomic features for follow-up study. Because both rare and 
common CNVs are under-ascertained by current methods, the rela- 
tive affect of these variants will continue to be an important research 
question for CNVs just as for SNPs. Of note, CNVs arising de novo in 
current cases and shown to be of importance in neuropsychiatric and 
developmental conditions*”* will not contribute to family resemb- 
lance and heritability, but could explain some of the variation at 
present attributed to ‘environment’. 

Several approaches have been developed for integrating analysis of 
CNVs into GWAS, including innovation in the design of GWA arrays 
(with associated discoveries in neuropsychiatric disorders”) and 
the use of the linkage disequilibrium relationships between SNPs and 
common CNPs (with associated discoveries in Crohn’s disease and 
body weight*”*'). These approaches are early in their development 
and have important limitations, although rapid progress is expected 
as CNV detection algorithms evolve and large-scale sequencing stud- 
ies produce comprehensive, high-resolution maps of segregating 
CNPs that can be measured in large reference panels. 

Many GWA data sets already have sufficient genotype and intens- 
ity information to permit calling of large, rare CNVs even if specific 
CNV probes were not included. As with non-structural single nuc- 
leotide sequence variants, more detailed (‘iterative’) phenotyping in 
relatives may reveal subtle phenotypic effects that were not initially 
appreciated. 


Harnessing family studies 


Family studies provide several opportunities for the investigation and 
interpretation of as-yet-unidentified genetic variation of many types 


Disease Locus Type of CNV Size (kb) Population frequency Case frequency Effect size (OR) 
Rare CNVs 
Autism/IMR®°? 16p11.2 De novo deletion 600 1 10" 1% 100 
Autism°? 16p11.2 Rare duplication 600 3x107 0.50% 16 
Schizophrenia®®’® ile 2a Rare deletion 1,400 PO 0.30% 15 
IMR’? 1q21.1 Rare deletion 1,400 DS NO 0.47% Not observed in 4,737 
controls 
Schizophrenia®®’® 15q13.3 Rare deletion 1,600 DSO 0.20% 12 
Epilepsy®° 15q13.3 Rare deletion 1,600 Der Oma 1.0% Not observed in 3,699 
controls 
IMR7?82 15q13.3 Rare deletion 1,600 DS NO 0.30% Not observed in 960 
controls 
Schizophrenia®* 22q11.2 Rare deletion 3,000 DS SS Moy 1% 40 
Common CNPs 
Crohn's disease®* IRGM Deletion polymorphism 20 7% 11% 1S 
Body mass index®* NEGR1 Deletion polymorphism 45 65% Quantitative trait <1kg 
Psoriasis®* GESE Deletion polymorphism 30 55% 65% 13} 


IMR, idiopathic mental retardation. 
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underlying complex diseases (Box 2). Family studies may facilitate 
the detection of rare and low frequency variants, and the identifica- 
tion of their associations with common diseases, because predispos- 
ing variants will be present at much higher frequency in affected 
relatives of an index case. 

Family studies also permit the investigation of parent-of-origin- 
specific effects, as have been reported for structural variants®”*’. If not 
properly accounted for, such effects could mask associations and 
diminish the proportion of heritability explained. High-density 
SNP data in extended pedigrees can be used to localize predisposition 
genes, as unexpectedly long runs of identity-by-state sharing among 
affected relatives suggest true IBD that is probably due to an under- 
lying genetic cause™. Linkage data can also enhance the power of 
high-density GWA scans by essentially relaxing P-value thresholds 
in the few instances in which suggestive findings overlap but are not 
definitive’. Family studies may also be useful in identifying gene— 
gene interactions, because affected relatives are more likely to share 
two nearby epistatic loci in linkage disequilibrium that would be 
unlinked in unrelated individuals***”’. 


Strategies for existing and future GWAS 


The nearly 400 GWAS published so far represent a wealth of data 
on the genetics of complex diseases’. These studies have provided 
valuable insights into the genetics of common diseases, particularly 
about the underlying genetic architecture of complex traits and the 
predominance of non-coding variants that may have a role in their 
aetiology. Just as linkage studies demonstrated that complex diseases 
cannot be explained by a small number of rare variants with large 
effects, GWAS have shown that they cannot be explained by a limited 
number of common variants of moderate effect (Fig. 1). The distinc- 
tion between low frequency and truly rare alleles is largely an opera- 
tional one, relating to the potential, given realistic effect sizes, for 
detecting associations with low frequency variants by GWAS at 
attainable sample sizes. Low frequency variants of intermediate effect 
might also contribute to explaining missing heritability that should be 
tractable through large meta-analyses and/or imputation of genome- 
wide association data. 

GWAS will probably remain an efficient way of investigating the 
remaining heritability, because their association signals may well 
define the genomic regions where rare variants, structural variants, 
and other forms of underlying variation are likely to cluster. The value 
of future studies can be enhanced by expanding to non-European 
samples and less common diseases and including more precise phe- 
notypes and measures of environmental exposures**** (Box 3). 
Information on lower frequency alleles emerging from projects such 
as the 1,000 Genomes will be used to produce even more comprehen- 
sive GWA arrays, and will facilitate the investigation of the lower 
frequency spectrum without the need for de novo sequencing. 


Box 2 | Using family studies to investigate missing heritability 


To investigate missing heritability using family studies, the following 
measures are required: (1) examine phenotypic effects of rare variants, 
particularly for subtle phenotypic abnormalities. (2) Investigate 
mutation rates and inheritance patterns of recurrent mutations. (3) 
Assess inheritance patterns of rare and structural variants. (4) 
Investigate parent-of-origin-specific effects. (5) Enhance power for 
identifying associated loci by studying affected sibs, particularly for 
conditions with substantial genetic heterogeneity. (6) Identify 
associated loci by unexpectedly long runs of identity-by-state sharing 
among distantly related affected relatives. (7) Enhance power of GWA 
scans by up-weighting P values in preselected regions based on linkage 
signals. (8) Identify gene-gene interactions by positive correlations 
between family-specific logs odds ratio (lod) scores or evidence of 
linkage disequilibrium among unlinked loci. 


REVIEWS 


Potential of research to explain missing heritability 


GWAS were initially designed to focus on the higher end of the 
frequency-effect size spectrum, so much work remains to be done, 
both in finding other variants in the lower frequency and larger effect 
domains shown in Fig. 1, and in understanding their functional and 
pathophysiological properties. To the extent that there are several 
causal variants on acommon haplotype or that causal variants are in 
imperfect linkage disequilibrium with genotyped markers, marker 
SNPs will underestimate the associated disease risk. 

The modest size of genetic effects detected so far confirms the 
multifactorial aetiology of these conditions and suggests that com- 
plex diseases will require substantially greater research effort to detect 
additional genetic influences. Near-term approaches for finding 
missing heritability on which there seems to be wide agreement 
include: targeted or whole-genome sequencing in people with 
extreme phenotypes, especially those with available family members 
and consent for recontact and iterative phenotyping; use of expanded 
reference panels of genomic variation such as 1,000 Genomes to 
enhance coverage of existing and future GWAS; mining of existing 
GWAS for associations with structural variants and evidence of gene— 
gene interactions; improved methods for detection of CNVs and 
other structural variants, applied to large, well-phenotyped groups 
and families; and expansion of sample sizes for numerous complex 
diseases through larger individual studies and meta-analyses, includ- 
ing people of non-European ancestry. 

Given all that has been learned of the genetic architecture of 
common diseases in the past few years, it may also be worthwhile 
to attempt exhaustive characterization of some well-studied traits by 
cataloguing all the contributing variation, be it in DNA sequence, 
DNA structure, chromatin structure, environmental modifiers, and 
defining all its functional implications. Potential criteria for deciding 
which traits to pursue aggressively in this way might include the 
strength and robustness of detected associations, evidence that asso- 
ciations are disrupted by varying linkage disequilibrium patterns, 
documented associations of identified loci with multiple traits, and 
public health importance of the traits to be studied. 


Box 3 | Making the most of existing and future GWAS 


The following steps can be used to make the most of existing and 
future GWAS: (1) ensure the wide availability of data with appropriate 
protections for consent and privacy. (2) Increase sample sizes and 
ensure thorough meta- and mega-analyses of comparable data, with 
increased focus on conditions with relatively small sample sizes 
studied so far. (3) Expand studies to non-European samples and more 
diverse diseases. (4) Improve phenotyping by expanding to subtler or 
more quantitative or precise phenotypes as needed to reduce 
heterogeneity or explore pleiotropic effects. (5) Capture larger 
proportion of variation in implicated genes. (6) Enhance the 
investigation of the X chromosome, particularly as the methods for 
imputation of X and Y markers improve. (7) Investigate gene-gene 
interactions, including dominance and epistasis. (8) Investigate 
gene-environment interactions: measure environment rigorously and 
analyse it against GWA data; examine rare exposures in common 
diseases for unusual responders; consider including GWA in 
monozygotic twins or migrant studies to identify gene-environment 
interaction interactions; conduct suitably large (several hundred 
thousand people) prospective cohort studies with GWA genotyping, 
and reproducible reliable exposure measures at baseline; include 
routine biobanking of material suitable for epigenetic analysis, such as 
non-immortalized lymphocytes for DNA methylation or cryopreserved 
cell or nuclear preparations for chromatin studies; relate quantitative 
phenotypes to epigenetic variation, which unlike SNPs is inherently 
quantitative; measure epigenetic variants in appropriate tissues when 
technically feasible. (9) Measure CNVs: use linkage disequilibrium 
patterns of SNP data and improved maps and imputation methods to 
identify common CNPs; use SNP intensity data to identify large CNVs 
where feasible regions; use best possible CNP typing array until using 
next generation sequencing for this purpose becomes feasible. 
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Explaining missing heritability, however intellectually satisfying, 
will probably have fewer practical applications as an end in itself than 
as a means to an end. The ultimate goal of this line of research, as with 
nearly all research in the genetics of complex disease, is to improve 
understanding of human physiology and disease aetiology so that 
more effective means of diagnosis, treatment and prevention can 
be developed. If a genetic variant(s) was found that opened the door 
to effective new treatments at low cost and with minimal side effects 
(LDL-receptor mutations and the statin class of drugs comes to 
mind), one would probably be content to leave some heritability 
unexplained. It is the expectation that associations identified by 
GWAS or other genomic methods will eventually enable effective 
disease prevention or treatment, either through delineation of the 
functional properties of variants recognized at present, or identifica- 
tion of new variants in which true functionality lies, that primarily 
motivates the hunt for missing heritability. 

It is more difficult to imagine predictive variants accounting for a 
sizeable proportion of disease risk without also explaining a sizeable 
proportion of heritability, and the limited incremental value in dis- 
ease prediction of variants identified so far suggests that genetic 
prediction of complex diseases on a population basis will be challen- 
ging’. Still, the identification of even many hundreds of risk 
variants of small effect should permit identification of the small 
proportion of a population at the highest genetically defined risk, 
in which targeted prevention strategies should be explored. If testing 
of such variants was to be conducted across several diseases, as is now 
feasible with dense genome-wide association genotyping and will be 
greatly facilitated by whole-genome sequencing, a sizeable number of 
people could be identified to be at greatly increased risk for at least 
one disease. Identification of genetic variants that influence disease 
risk, prognosis, or the response to treatment should enable the 
development of diagnostic and interventional strategies that are safe, 
effective and as necessary, individualized”, although the value of 
genetic variants in disease prediction and the steps needed to realize 
this are widely debated®’”’. Given how little has actually been 
explained of the demonstrable genetic influences on most common 
diseases, despite identification of hundreds of associated genetic 
variants, the search for missing heritability provides a potentially 
valuable path towards further discoveries. 
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Nucleation, propagation and cleavage of 
target RNAs in Ago silencing complexes 


Yanli Wang", Stefan Juranek*, Haitao Li', Gang Sheng', Greg S. Wardle”, Thomas Tuschl? & Dinshaw J. Patel’ 


The slicer activity of the RNA-induced silencing complex resides within its Argonaute (Ago) component, in which the PIWI 
domain provides the catalytic residues governing guide-strand mediated site-specific cleavage of target RNA. Here we 
report on structures of ternary complexes of Thermus thermophilus Ago catalytic mutants with 5’-phosphorylated 
21-nucleotide guide DNA and complementary target RNAs of 12, 15 and 19 nucleotides in length, which define the molecular 
basis for Mg?’ -facilitated site-specific cleavage of the target. We observe pivot-like domain movements within the Ago 
scaffold on proceeding from nucleation to propagation steps of guide-target duplex formation, with duplex zippering beyond 
one turn of the helix requiring the release of the 3’-end of the guide from the PAZ pocket. Cleavage assays on targets of 
various lengths supported this model, and sugar-phosphate-backbone-modified target strands showed the importance of 
structural and catalytic divalent metal ions observed in the crystal structures. 


Ago is the key component of the RNA-induced silencing complex 
(RISC) and has an essential role in guide-strand-mediated target 
RNA recognition, cleavage and product release’ *. Ago adopts a bilobal 
architecture, composed of amino-terminal PAZ-containing (N and 
PAZ) and carboxy-terminal PIWI-containing (Mid and PIWI) lobes. 
The PIWI domain adopts an RNase H fold?" in which the catalytic 
Asp-Asp-Asp/His residues contribute to slicer activity'’*; the Mid 
domain sequesters the 5’-phosphate of the guide strand'*’*; and the 
PAZ domain recognizes the 2-nucleotide overhang at the 3'-end of 
the guide strand'®'’, Ago-mediated target-RNA cleavage requires 
Watson-—Crick pairing between guide and target, spanning both the 
seed segment (positions 2-8) and the cleavage site (10-11 step) as 
counted from the 5’-end of the guide strand**. Endonucleolytic 
cleavage is mediated by Mg*~ cations'*"° and generates fragments 
containing a 3’-OH for the 5’-segment and a 5’-phosphate for the 
3'-segment”. Molecular insights into target RNA recognition and 
cleavage have emerged from chemical*’”’, biophysical”® and struc- 
tural>’*”’ studies, with potential application of RNA-interference- 
based approaches as a therapeutic modality against a range of human 
diseases**”?. 

We have previously reported on crystal structures of T. thermophilus 
Ago bound to 5'-phosphorylated 21-nucleotide guide DNA (binary 
complex)”, and with added 20-nucleotide target RNA (ternary 
complex)*' (see Supplementary Materials for a summary of these 
results). A major limitation of the earlier structural study of the ternary 
complex"! was that the bases of the target RNA could not be monitored 
owing to disordered electron density at the 10-11 cleavage site as a 
result of mismatch incorporation at these steps to prevent cleavage 
activity. The catalytic activity of the RNaseH fold of the PIWI 
domain of T. thermophilus Ago originates in Asp residues 478, 546 
and 660, and hence, in the present study, single Asp to Asn, Glu or Ala 
mutants were incorporated at these positions to inhibit the cleavage 
activity. The ternary complexes of these catalytic mutants with bound 
guide DNA and varying target RNA lengths were then generated 
with 5'-phosphorylated 21-nucleotide guide DNA and fully comple- 
mentary target RNAs of varying length (12, 15 and 19 nucleotides) — 
conditions under which both the seed segment and the cleavage site 


could be potentially monitored, thereby providing insights into 
cleavage mechanism. 


Cleavage site in Ago ternary complexes 
We have solved the 2.6A crystal structure of the Asn 546 catalytic 
mutant of T. thermophilus Ago bound to 5’-phosphorylated 21- 
nucleotide guide DNA and a 12-nucleotide target RNA that is fully 
complementary along the length of the duplex (Fig. 1a). This is our 
highest resolution structure of a ternary complex to date (Fig. 1b; 
stereo view in a different perspective in Supplementary Fig. la; X-ray 
statistics are listed in Supplementary Table 1), and has provided 
detailed insights into the alignment of the guide and target strands 
that span both the seed segment and the cleavage site. The guide DNA 
strand in red can be monitored from positions 1-12 spanning the 
5'-half and for positions 20-21 at the 3’-end, whereas the target RNA 
strand in blue can be monitored for positions 2'-12' (Fig. 1b). Both 
ends of the guide strand are anchored in their respective binding 
pockets despite formation of an 11-base-pair (bp) DNA-RNA 
duplex. Intermolecular contacts within the 12-nucleotide target 
ternary complex are highlighted in Supplementary Fig. 2. Bases 1 
and 2 are splayed, with thymine at position 1 stacked over the side 
chain of Arg 418, and its N3 nitrogen and O4 oxygen hydrogen- 
bonded to the backbone (Met 413) and the side chain (Asn 436) of 
the Ago scaffold (Fig. 1c). Base 1 is the only residue on the guide 
strand that makes base-specific contacts with the Ago scaffold, and 
this observation is consistent with the reported sorting of small RNAs 
in Arabidopsis Ago complexes by the 5’-terminal nucleotide*””’. 
The guide DNA-target RNA duplex spanning positions 2 to 12 
(Fig. 1d) superpositions better with an A-form helix than with its 
B-form counterpart (Supplementary Fig. 3a and b, respectively), with 
the scissile phosphate (10-11 step) on the target strand positioned 
opposite the catalytic residues (Asp 478, Asp 660 and Asn 546 mutant) 
of the RNase H fold of the PIWI domain (Fig. 1d, e). Bases 10 and 11 of 
the target strand stack on each other in a catalytically competent 
helical conformation in the ternary Ago complex (Fig. 1f), in contrast 
to the orthogonal arrangements of these bases owing to the insertion 
of Arg548 between them in the binary Ago complex*’ (compare 
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Figure 1| Crystal structure of T. thermophilus Ago(Asn 546) catalytic 
mutant bound to 5’-phosphorylated 21-nucleotide guide DNA and 12- 
nucleotide target RNA. a, Sequence of the guide DNA-target RNA duplex. 
The traceable segments of the bases of the guide DNA and target RNA in the 
structure of the ternary complex are shown in red and blue, respectively. 
Disordered segments of the bases on both strands that cannot be traced are 
shown in grey. b, View of the 2.6 A crystal structure of the Ago ternary 
complex. The Ago protein domains (N in cyan, PAZ in magenta, Mid in 
orange, PIWI in green) and linkers (L1 and L2 in grey) are colour-coded. The 
bound 21-nucleotide guide DNA (red) is traced for bases 1-12 and 20-21, 
whereas the bound 12-nucleotide target RNA (blue) is traced for bases 
2'-12'. Backbone phosphorus atoms are yellow. Both ends of the bound 
guide DNA are anchored. c, Expanded view of the ternary complex 
highlighting the alignment of guide DNA (1-3) and target RNA (2’-3’), 
where the bases of the 1-2 step of the guide strand are splayed. Note the 
intermolecular hydrogen-bonding of the Watson—Crick edge of T1 with the 


Supplementary Fig. 4a (binary) with 4b (ternary)). Conformational 
changes in both the guide strand (Supplementary Fig. 5a) and Ago 
(Supplementary Fig. 5b) accompany the transition from binary to 
ternary complex formation (Supplementary Fig. 6 and Supplemen- 
tary Movie 1). 


Release of guide 3’-end from PAZ pocket 


Next we solved the 3.05 A crystal structure of the Glu 546 catalytic 
mutant of T. thermophilus Ago bound to 5’-phosphorylated 21- 
nucleotide guide DNA and a 15-nucleotide target RNA that is fully 
complementary along the length of the duplex (Fig. 2a; stereo view in 
Supplementary Fig. 1b; X-ray statistics are listed in Supplementary 
Table 1). The guide DNA strand can be monitored from positions 
1-16, whereas the target RNA strand can be monitored from 
positions 2’-15' (Fig. 2b). The 5'-phosphate of the guide strand is 
still anchored in the Mid pocket, but the 3'-end (positions 17-21 are 
disordered and cannot be traced) is released from the PAZ pocket on 
formation of the 14-bp duplex spanning positions 2-15 of the guide 
strand. The molecular basis for the release of the 3’-end of the 
guide strand is that the helical conformation for nucleotides 12-15 
disallows the 3’-end from reaching the binding pocket in the PAZ 
domain. 

We observe conformational changes on proceeding from the ternary 
Ago complex with bound 12-nucleotide target (Fig. 2c) to its counter- 
part with bound 15-nucleotide target (Fig. 2d), and these changes can 
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backbone amide carbonyl of Met 413 and side chain of Asn 436, as well as the 
positioning of phosphate 1 of the guide strand in the Mid binding pocket. A 
Mg” cation (purple) coordinates to phosphates 1 and 3 of the guide strand, 
as well as to an inserted carboxylate of Val 685 from the C terminus. 

d, Expanded view of the ternary complex highlighting the guide DNA 
(1-12)-target RNA (2'—12’) duplex, together with the catalytic residues 
(Asp 478, Asp 660 and Asn 546 mutant) of the RNase H fold of the PIWI 
domain. The scissile phosphate group at the 10’—11' step of the target RNA is 
indicated by a red arrow. e, Expanded view highlighting the positioning of 
the backbone phosphate linking the 10’—11’ step (phosphorus coloured in 
magenta) of the target RNA relative to the catalytic residues (Asp 478, 

Asp 660 and Asn 546 mutant) in the ternary complex. f, Positioning of the 
side chain of Arg 548 relative to the guide DNA (6—12)-target RNA (6’—12’) 
duplex. Note the intermolecular contacts between the sugar-phosphate 
backbone of the guide strand and side chains of the protein in the ternary 
complex. 


be visualized after superpositioning of the PIWI-containing (Mid and 
PIWI) lobe as shown by the yellow arrow in Fig. 2e (also see Sup- 
plementary Movie 2). These changes involve a pivotal rotation of the 
PAZ domain (compare PAZ domain alignments in Supplementary 
Fig. 7a and b), as well as movement of loops L1 and L2 located on 
the nucleic-acid-interfacing surface of the PIWI domain (Fig. 2f). 

Details of intermolecular contacts between loop L1 and the guide 
DNA 11-12 segment in the 12-nucleotide target RNA ternary complex 
are shown in Fig. 2g, whereas intermolecular contacts between loops 
L1 and L2 and the guide DNA 11-15 segment in the 15-nucleotide 
target RNA ternary complex are shown in Fig. 2h. Notably, L1 changes 
from a loop (Fig. 2g) to a B-turn (Fig. 2h) on proceeding from the 
12- to the 15-nucleotide target RNA ternary complexes, resulting in 
several extra hydrogen bonds within this B-turn and with loop 2, 
thereby stabilizing this new conformation. The conformational tran- 
sitions in loops L1 and L2 are required to avoid steric clashes with the 
DNA guide strand (Supplementary Fig. 8) on addition of three more 
base pairs on proceeding from the 12- to the 15-nucleotide target 
RNA ternary complexes. Unexpectedly, changes in the conforma- 
tion of loop L1 force the attached B-strand encompassing residues 
489-493, as part of a multi-stranded f-sheet, to slide by a single 
residue with the accompanying flip of the entire B-strand and its side 
chains, on proceeding from the 12- to the 15-nucleotide target ternary 
complex (Fig. 2i, identified by a black double-edged arrow in Fig. 2f 
and Supplementary Fig. 9). 
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Figure 2 | Crystal structure of T. thermophilus Ago(Glu 546) catalytic 
mutant bound to 5’-phosphorylated 21-nucleotide guide DNA and 15- 
nucleotide target RNA. a, Sequence of the guide DNA-target RNA duplex, 
with traceable segments colour-coded as in Fig. 1a. b, View of the 3.05 A 
crystal structure of the Ago ternary complex, colour-coded as outlined in 
Fig. 1b. The bound 21-nucleotide guide DNA (red) is traced for bases 1-16, 
whereas the bound 15-nucleotide target RNA (blue) is traced for bases 
2'-15'. Only the 5’-end of the guide DNA is anchored in this ternary 
complex. ¢, d, Comparison of the crystal structures of mutant 

Ago(Asn 546)—12-nucleotide target (c) and of mutant Ago(Glu 546)-15- 
nucleotide target (d) ternary complexes. The Ago protein is shown in a 
surface representation with domains and linkers colour-coded as in Fig. 1b. 
The guide DNA (red) and target RNA (blue) are shown in stick 
representation with backbone phosphorus atoms in yellow. e, View of the 
alignment of mutant Ago(Asn 546)—12-nucleotide target complex 
(magenta) and mutant Ago(Glu 546)-15-nucleotide target complex (silver), 
after superpositioning of their PIWI-containing (Mid and PIWI) modules. 
The yellow arrow indicates the magnitude of the conformational change on 
proceeding from the 12-nucleotide target to 15-nucleotide target ternary 
complexes. f, Conformational changes in loop 1 (residues 479-488, red 
arrow) and loop 2 (residues 505-516, green arrow) of the PIWI domain on 


In mechanistic terms, we favour the view that the conformational 
transitions in loops L1 and L2 and associated sliding and flipping of the 
B-strand are triggered by widening of the substrate-binding channel 
between the PIWI and N domains to accommodate a lengthening of 
the A-form duplex from 11-bp in the 12-nucleotide target RNA com- 
plex to 14-bp in the 15-nucleotide target RNA complex. Such changes 
not only push the PAZ domain away but also release the 3’ end of guide 
strand from the PAZ-binding pocket (Figs 1b, 2b and Supplementary 
Fig. 7). Moreover, we note that sliding and flipping of the B-strand 
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proceeding from the 12-nucleotide target ternary complex (magenta) to the 
15-nucleotide target ternary complex (silver). Only the DNA—RNA duplex 
for the 15-nucleotide target ternary complex is shown in cyan in a surface 
representation. Loops 1 and 2 are coloured light red (labelled L1’) and light 
green (labelled L2’) in the 12-nucleotide target ternary complex, and dark 
red (labelled L1) and dark green (labelled L2) in the 15-nucleotide target 
ternary complex. The -strand involved in sliding is highlighted by a black 
double-edged arrow. g, Ternary complex containing 12-nucleotide target 
RNA. Residues 11 and 12 of the guide strand are in red, and loops L1' and L2’ 
are in light red and light green, respectively. h, Ternary complex containing 
15-nucleotide target RNA. Residues 11 to 15 of the guide strand are in red, 
and loops L1 and L2 are in dark red and dark green, respectively. Loop L1 
switches to a B-turn aligned by hydrogen bonding within the turn and also 
with loop L2, thereby stabilizing this turn conformation. The main-chain of 
Glu 512 forms a hydrogen bond with the phosphate group of residue 14 of 
the guide DNA. The positively charged side chains of Arg 513 and Arg 486 
interact with the backbone of the DNA guide strand, as indicated by blue 
arrows. i, Ribbon representation of the sliding of the B-strand (Gly 489 to 
Val 494) by one residue, and conformational transition in adjacent L1 loop 
on proceeding from the 12-nucleotide target RNA ternary complex 
(magenta) to 15-nucleotide target RNA ternary complex (silver). 


occurs with minimal perturbation of B-sheet formation (schematic in 
Supplementary Fig. 9), and flipping of the entire B-strand does not 
disrupt specific side-chain interactions. 

We have compared the structures of Ago mutant ternary com- 
plexes with 12-nucleotide (Fig. 1b) and 15-nucleotide (Fig. 2b) target 
RNAs reported in this study with the previously reported structure of 
the ternary complex of wild-type Ago with 20-nucleotide target RNA 
containing a pair of mismatches at the cleavage site*'. The previous 
structure of the ternary complex (two molecules in the asymmetric 


©2009 Macmillan Publishers Limited. All rights reserved 


NATURE|Vol 461|8 October 2009 


unit)*’ and one solved recently in a different crystal form (one mole- 
cule in the asymmetric unit; X-ray crystallographic statistics in 
Supplementary Table 2) in which segment 2-9 is fully paired and 
both ends of the guide strand are anchored, are most similar to the 
ternary complex with 12-nucleotide target RNA in the present study, 
in which segment 2-12 is fully paired and both ends of the guide 
strand are also anchored (comparison outlined in Supplementary 
Fig. 10a, b). 

Our studies resolve a mechanistic issue related to guide-strand- 
mediated recognition and cleavage of target RNA within Ago com- 
plexes. Several groups have proposed a ‘two-state’ model in which the 
guide strand is anchored at both ofits ends during the nucleation step 
of target recognition, but its 3’-end is released from the PAZ pocket 
owing to topological constraints, after propagation of the duplex 
towards the 3’-end of the guide strand*’’**. An alternative ‘fixed- 
end’ model proposed that both ends of the guide strand remain 
anchored during the nucleation and the propagation steps of RNA 
recognition™. Our results support a two-state mechanism for the 
system under study, given that our structures demonstrate that 
both ends of the guide strand are anchored in a ternary complex 
containing one turn of the A-form helix (12-nucleotide target 
RNA) spanning the seed segment and cleavage site (Fig. 1b), but 
the 3’-end is released from the PAZ pocket on extending this duplex 
by three more base pairs (15-nucleotide target RNA) towards the 
3'-end of the guide strand (Fig. 2b). 


1 10 20 
5’ p- TGAGGTAGTAGGTTGTATAGT 3’ 
PETE TET 
3’ GCUCCAUCAUCCAACALIAU 


Figure 3 | Crystal structure of T. thermophilus Ago(Asn 478) catalytic 
mutant bound to 5’-phosphorylated 21-nucleotide guide DNA and 19- 
nucleotide target RNA and identification of Mg2* binding sites within the 
catalytic pocket of the wild-type Ago complex. a, Sequence of the guide 
DNA-target RNA duplex, with traceable segments colour-coded as in Fig. la. 
b, View of the 2.8 A crystal structure of the ternary complex, colour-coded as 
outlined in Fig. 1b. The bound 21-nucleotide guide DNA (red) is traced for 
bases 1-16, whereas the bound 19-nucleotide target RNA (blue) is traced for 
bases 2’-16'. Only the 5’-end of the guide strand is anchored in this ternary 
complex. c, Expanded view of the 19-nucleotide target ternary complex 
highlighting blocking of propagation of the guide DNA-target RNA duplex 
beyond pair 16 by the N domain. Base 16 of the guide strand stacks over the 
aromatic ring of Tyr 43, whereas base 16’ of the target strand stacks over 
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N domain blocks guide-target pairing beyond position 16 


The 2.8A crystal structure of the Asn478 catalytic mutant of 
T. thermophilus Ago bound to 5’-phosphorylated 21-nucleotide 
guide DNA and a 19-nucleotide target RNA (sequence in Fig. 3a, 
structure in Fig. 3b, stereo view in Supplementary Fig. 1c; X-ray 
statistics are listed in Supplementary Table 1) is similar to the 
Ago(Glu 546) catalytic mutant ternary complex with 15-nucleotide 
target RNA (Fig. 2b), except that one extra base pair can be traced, 
allowing monitoring of 15-bp of guide-target duplex spanning 
positions 2-16 of the guide strand (stereo electron density maps of 
the guide and target strands are shown in Supplementary Fig. 11). 
Intermolecular contacts within the 19-nucleotide target ternary 
complex are highlighted in Supplementary Fig. 12). Furthermore, 
the sugar-phosphate backbone of the target strand is intact at the 
10-11 step, and on either side of it, for both 15- and 19-nucleotide 
target ternary complexes (see F, — F.omit maps contoured at 3.70 in 
Supplementary Fig. 13a and b, respectively). 

An unexpected mechanistic insight to emerge from our structural 
studies of the three ternary Ago complexes outlined earlier is that the 
guide DNA-target RNA duplex retains the A-form duplex architec- 
ture spanning the seed segment, the cleavage site and observable 
elements towards the 3’-end of the guide strand (up to position 
16), and it is solely the Ago scaffold that adjusts by pivot-like domain 
movements, to relieve the topological stress associated with zippering 
up the RNA target through pairing with its guide-strand template. A 


Pro 44. d, Intermolecular hydrogen-bonding contacts between the sugar- 
phosphate backbone of the 10’-13’ target RNA segment and backbone and 
side chains of the PIWI domain in the 19-nucleotide target ternary complex. 
e, f, F, — F, omit maps (blue colour, contoured at 3.50) of the 9’-12' 
segment of bound RNA and catalytic Asp 478, Asp 546 and Asp 660 residues 
in the 3.3 A structures of the ternary complexes in 50 mM Mg’* (e, space 
group P432,2, one molecule in the asymmetric unit) and in 80 mM Mg”* 
(f, space group P2,2,2,, two molecules in asymmetric unit). Bound Mg”* 
cation(s) were identified in omit maps contoured in purple at 6.00 as 
outlined in e and f, based on coordination to several oxygen atoms in an 
approximate octahedral geometry. One bound Mg” cation can be assigned 
in the ternary complex in 50 mM Mg”" ine, and two bound Mg’ " cations 
can be assigned in the ternary complex in 80 mM Mg”" inf. 
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second unanticipated observation is that the N domain blocks 
propagation of the guide DNA-target RNA duplex beyond position 
16 in the 19-nucleotide target ternary complex (Fig. 3c), with the base 
at position 16 of the guide strand stacking on the aromatic ring of 
Tyr 43, and the base at 16’ of the target strand stacked over the Pro 44 
ring. Thus, base pairing is disrupted for steps 17, 18 and 19, with 
anticipated trajectories for the separated guide and target strands 
schematized in Supplementary Fig. 14. 

The sugar-phosphate backbone spanning the seed segment of the 
guide but not the target strand is hydrogen-bonded to the protein (see 
Supplementary Movie 3). We also note that the sugar-phosphate back- 
bone of the target RNA spanning the 10’—13’ segment forms inter- 
molecular hydrogen bonds with the Ago scaffold in the 12-nucleotide 
(Supplementary Fig. 15a), 15-nucleotide (Supplementary Fig. 15b) 
and 19-nucleotide (Fig. 3d) target ternary complexes, establishing 
the potential for photochemically facilitated cross links between this 
segment of the target RNA and its spatially identified proximal sites on 
the protein*’. 


A pair of Mg”* cations mediates cleavage chemistry 


The PIWI domain of Ago adopts an RNase H fold?''*®*', with catalytic 
Asp 478, Asp 546 and Asp 660 residues lining the active site of the 
T. thermophilus enzyme. Two Mg’* cations have been shown to facili- 
tate RNA hydrolysis during catalytic cleavage by RNase-H-containing 
nucleases, with cation A assisting nucleophilic attack by positioning 
and activating a water molecule, and cation B stabilizing the transition 
state and leaving group**”’. Because catalytic mutations could induce 
distortions of the optimal geometry for coordination to divalent 
cations, we attempted to identify bound Mg”~ cation(s) in the catalytic 
pocket of the ternary complex of wild-type T. thermophilus Ago with 
19-nucleotide target RNA, that is fully complementary to positions 
2-19 of the guide strand (Fig. 3a). 

Crystals of the Ago ternary complex were grown as a function of 
Mg’* concentration, with 3.3A data sets collected for crystals in 
50mM Mg”* (space group P432,2, one molecule in the asymmetric 
unit) and 80 mM Mg’* (space group P2,2,2;, two molecules in 
asymmetric unit) solution (X-ray statistics listed in Supplementary 
Table 3). Gel electrophoresis of the crystals established that the 
target RNA was not cleaved in either complex, presumably because 
T. thermophilus Ago-mediated cleavage is optimal at higher tempera- 
tures and has a marked preference for Mn** over Mg”™ (ref. 11). The 
F,— F, omit maps (blue colour, contoured at 3.50) of the target 
strand residues 9'—12' and catalytic Asp residues for the Ago ternary 
structures in 50 mM Mg** and 80 mM Mg’ ” are shown in Fig. 3e and 
f, respectively. A single bound Mg’", positioned towards the leaving 
group side of the scissile phosphate (cation B) can be identified in the 
structure in 50mM Mg’* (Fig. 3e, omit map contoured in purple at 
6.06), with an intact target RNA readily traceable for the 9’—12' 
segment. A pair of Mg’* cations separated by 3.9 A, which coordinate 
the hydrolysis of the scissile phosphate, were identified in the struc- 
ture in 80 mM Mg’* (Fig. 3f). The assignment of the extra density to 
Mg** site(s) at 3.3 A resolution is based on coordination of the 
divalent cation(s) to several oxygen atoms in an approximate 
octahedral geometry (stereo views in Supplementary Fig. 16a, b). Of 
the three catalytic Asp residues lining the catalytic pocket, only Asp 478 
coordinates to both Mg” cations (Fig. 3fand Supplementary Fig. 16b). 
The structures of the catalytic residues, Mg** sites and RNA 
backbone for B. halodurans RNase H (1.85 A) and T. thermophilus 
Ago (3.3 A) complexes are superpositioned in stereo for comparative 
purposes in Supplementary Fig. 17. Given that the crystals of the 
ternary complexes grown from both 50and 80mM Mg”~ diffract to 
3.3 A resolution, it is at present not possible to identify the position of 
the water molecule that would participate and be positioned for in-line 
attack on the scissile phosphate. 

We observe detectable conformational changes after super- 
positioning of the single and the pair of Mg** -bound ternary com- 
plex structures through their PIWI-containing lobes. These changes 
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are restricted to the PAZ domain (Supplementary Fig. 18a) and the 
target RNA strand (Supplementary Fig. 18b). The catalytic residues 
are optimally positioned for cleavage in the structure of the ternary 
complex with a pair of Mg’* cations. 

Thus, the Ago protein, capitalizing on the RNase H fold ofits PIWI 
domain®"’, uses three catalytic Asp residues and two Mg’~ cations to 
facilitate site-specific cleavage of RNA targets, yielding products con- 
taining 5'-phosphate and 3'-OH ends’, a feature in common with 
members of the retroviral integrase superfamily”. 


Analysis of the catalytic activity of T. thermophilus Ago 


Target RNA cleaving bacterial complexes are most effectively re- 
constituted using single-stranded guide DNA rather than 
RNA''?*°°?! To explore whether DNA might also function as a 
target, we subjected chemically synthesized DNA and RNA targets 
(Supplementary Table 4) to DNA-guided Ago cleavage reactions. 
DNA is resistant to hydrolysis by divalent metal ions and high tem- 
perature incubation, thereby yielding a clearer picture of target 
cleavage. T. thermophilus Ago loaded with guide DNA derived from 
luciferase sequence studied previously***' cleaved DNA as well as 
RNA targets; however, several unexpected minor cleavage products 
were also observed (Supplementary Fig. 19). These side products 
resulted from partial self-complementarity of the guide DNA, 
leading to cleavage of guide DNA during the Ago loading process 
and acceptance of the shorter cleavage products as guide DNAs. We 
therefore tested new guide and target sequence pairs, identical to the 
microRNA let-7 sequence selected for crystallography. The let-7 
guide and target molecules yielded a single cleavage band, with 
DNA being a better substrate than RNA (Supplementary Fig. 20a). 
Target DNA cleavage occurred in the presence of Mg”* or Mn*", but 
not Ca”* (Supplementary Fig. 20b), supporting single and multiple 
turnover (Supplementary Fig. 20c). Cleavage products started to 
accumulate after a short (about 2 min) lag phase, at an approximate 
rate constant of 0.1 min”! under single turnover (0.5 uM target) and 
0.2-0.4min™' under multiple turnover (5M target) conditions 
(Supplementary Fig. 20c). These rate constants indicate that our 
cleavage conditions are approaching substrate saturation and that 
product release is not rate limiting. We also included cleavage experi- 
ments using mutant Ago proteins that were used for the crystal 
structures (Figs 1-3) and tested for DNA-guided RNA (Supplemen- 
tary Fig. 21a) or DNA (Supplementary Fig. 21b) target cleavage. Of 
the mutant Agos, only the Asn 546 mutant showed some residual 
activity, and product formation was reduced >500-fold. 


Minimal target DNA requirements 


Previously, we showed that luciferase guide DNA strands as short as 9 
nucleotides promoted target RNA cleavage; the minimal target 
length was not addressed*'. We first shortened the let-7 DNA target 
(Fig. 4a) from its 5’ end (Fig. 4b). Truncation of the target to 16 
nucleotides did not alter cleavage activity, but 15- and 14-nucleotide 
targets showed 120- and 400-fold reduced cleavage rates, respecti- 
vely, and a 12-nucleotide target was not cleaved. This indicates that 
residues 17’ and higher do not contribute to cleavage, and was further 
supported by our finding that 21- or 24-nucleotide DNA targets, in 
which regions 17’-21’ or 17'-24' were unpaired with same size 
guides, showed similar activity compared to their fully paired ver- 
sions (Supplementary Fig. 22). 

To examine the importance of the 3’ end of the target, we tested 
15-nucleotide DNA target strands displaced in 1-nucleotide steps 
relative to the let-7 target (Fig. 4c). DNA targets covering 2'—16’, 
3'-17' and 4’-18' showed cleavage activity similar or better than 
21-nucleotide-long targets, but 100- and 500-fold reduced rates were 
obtained for targets covering 5'-19’ and 6'—20’. These experiments 
indicate that positions 1’ to 3’ were dispensable for target cleavage. 

In summary, positions 4’ to 16’ need to be paired to facilitate 
efficient target DNA cleavage when presented to T. thermophilus 
Ago loaded with 21-nucleotide guide DNA. On the other hand, guide 
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Figure 4 | Effect of complementarity and length on target DNA cleavage by 
T. thermophilus Ago. Cleavage reactions were performed as described in the 
Methods, and products were resolved on denaturing polyacrylamide gels; for 
DNA sequences, see Supplementary Table 4. a, Schematic of the reference 
DNA duplex utilized for length variation experiments; the cleavage site is 
indicated by an arrow, the position of the *’P label by an asterisk. 


DNAas short as 9 nucleotides promoted T. thermophilus Ago cleavage 
of target RNA, indicating that base-pairing involving residues 10’ to 
16’ per se is not essential. Short guides, in contrast to 21-nucleotide 
guides, are unable to occupy the PAZ domain with their 3’ ends. 
Therefore, we speculate that transitioning of the Ago ternary complex 
into a cleavage-active conformation requires either the release of the 
guide 3’-end PAZ interaction or its initial absence as seen for short 
guide strands. Release of the PAZ guide 3’-end interactions is driven 
by base-pairing including position 16’ of a target. 

It may seem surprising that the Ago conformation of the 15- and 
19-nucleotide target-RNA-containing structures were similar. How- 
ever, the thermodynamic stability of DNA—RNA duplexes is different 
from DNA-DNA duplexes*’, and fewer but more stable base pairs 
may facilitate the switch to the active conformation. In support of 
this view, we observed that the cleavage activity for the 15-nucleotide 
(positions 1-15’) and a 16-nucleotide (positions 1’-16’) target 
RNAs (Supplementary Fig. 23) only differed by 1.4-fold and was 
comparable to that of the longer target RNA (Supplementary Fig. 21). 

Our crystal structures also indicated that base pairs involving posi- 
tions 17’ or higher could not form owing to steric clashes with the 
N-terminal domain. To test whether propagation of the duplex 
beyond position 16’ could contribute to catalysis, we tested Ago 
deletion mutants del(1—106) and del(1—177) but found that they lost 
all activity (Supplementary Fig. 21a, b). This suggests that the N 
domain also has a crucial involvement in transitioning or stabilizing 
the active conformation of the ternary complex, and could possibly 
even affect other steps including loading of the guide DNA, which 
were not tested. 


Target DNA sugar-phosphate backbone role during cleavage 


To assess the contribution of sugar and phosphate residues during 
target DNA recognition and cleavage, we introduced 2'-hydroxyl 
(OH) and 2'-methoxy (Ome) modifications at positions 9’, 10’ or 
11’, as well as 2'-fluoro (F) at positions 10’ or 11’ (Fig. 5a, Sup- 
plementary Fig. 24 and Supplementary Table 4). OH, Ome and F 


b, Shortening of the target DNA from its 5’ end. Alterations of the target 
DNA and corresponding paired structure are illustrated to the left. Target 
DNA cleavage was performed at 65 °C rather than 75 °C to facilitate 
hybridization of shortened targets. nt, nucleotides. c, Positional variation of 
15-nucleotide target DNAs. For labelling and reaction conditions, see b. 


2'-modified ribonucleosides favour the A-helical C3’-endo ribose 
conformation, whereas deoxynucleotides are preferable in the 
B-helical C2'’-endo conformation*’, and therefore stabilize double- 
helical structures. The most profound effects on cleavage were shown 
by 2'-substitutions at residue 11’, which are immediately adjacent to 
the cleaved phosphodiester bond. The 2’-F substitution enhanced 
the single (Fig. 5a) and multiple (Supplementary Fig. 24) turnover 
cleavage rate by approximately 4- and 6-fold, respectively, compared 
to 2’-H, presumably because the electronegative 2’-F group is able to 
stabilize the developing negative charge of the 3’ oxygen leaving group 
during the transition state. The cleavage rate was reduced twofold by 
2'-OH at residue 11’, and 2'-Ome completely abrogated cleavage, 
presumably by affecting the hydration pattern optimal for stabiliza- 
tion of the transition state. Also, there is no evidence for hydrogen 
bonding of the 2’ residue to neighbouring nucleotides or amino-acid 
side chains. Taken together, the drastic effects on reaction rates by 2' 
modifications at the 11’ position cannot be rationalized by simple 
differences in sugar conformation, but by a combination of electronic 
and steric effects differentially affecting the transition state. 
Modifications of the 2' position one nucleotide removed from the 
cleavage site showed less or no effect; in contrast, position 9’ showed 
an unanticipated threefold reduction in rate for 2’-OH and 2'-Ome 
(Fig. 5a). 

To probe the role of phosphate oxygens, which can coordinate 
structurally or catalytically important divalent metal ions“, we syn- 
thesized the mixed phosphorothioate diastereomers located between 
residues 8’ and 9’, 9’ and 10’, 10’ and 11’, or 11’ and 12’, and purified 
by reverse-phase high-performance liquid chromatography (HPLC) 
the Sp form to >85%, and the Rp form to >97% purity. Cleavage 
reactions were performed in the presence of either 5 mM Mg*~, which 
preferably coordinates to oxygen, or 5mM Mn*", which preferably 
coordinates to sulphur. Phosphorothioate substitution at the cleavage 
site, positions 10’—11', showed the most profound effects (Fig. 5b). In 
Mg? *-containing buffer, the Sp form was inactive and the Rp form was 
reduced 200-fold in single-turnover cleavage rates. The loss of activity 
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Figure 5 | Effect of sugar-phosphate backbone modifications on target 
DNA cleavage by T. thermophilus Ago. Cleavage experiments were 
performed as described in Methods. a, 2’-fluoro-, 2'-methoxy- and 2’- 
hydroxyl-substitutions of single 2'-deoxyribose residues of the target DNA 
strand at and near the cleavage site. The control target (unmod.) was the 
unmodified oligodeoxynucleotide. b, Phosphorothioate modification of the 
target DNA. The phosphate configuration (Rp or Sp) of the 
phosphorothioate diastereomers is indicated. Cleavage assays were 
performed in the presence of either Mg”* or Mn’* cations. Note that the 
experiment for the 11’—12' isomers was a different experiment, in which 
overall reaction rates were slower. For the complete experiment see 
Supplementary Fig. 25. Sequences of oligonucleotides are in Supplementary 
Table 4. c, Structure of the cleavage site modelling the attack of the hydroxyl 
nucleophile. Phosphate oxygen and active site carboxylate oxygens 
coordinated to metal ions A and B (purple spheres), with distances less than 
2.5 A shown as blue dashed lines. The coordination of the carboxylate 
oxygen from Asp 546 to metal ion B is hidden in the projection. The 
phosphate oxygens and 2’ residues sensitive to modification are shown as 
yellow and green spheres, respectively; R denotes 2'-H, -OH, -F or -Ome. 
Red arrows indicate the attack of the hydroxyl nucleophile modelled to be 
directly coordinated by metal ion A, and the stabilization of the developing 
negative charge of the 3’ oxyanion leaving group by metal ion B. 


of the Rp form was rescued by Mn’", yielding a less than twofold 
reduction compared to 2’-H; however, the Sp form remained inactive. 
Phosphorothioate substitutions more distant to the cleavage site 
either had no effect (Rp and Sp at positions 8’—9’, Sp at positions 
9'-10' and Rp at positions 11'—12'), or were reduced by 15-fold 
and by more than 80-fold for Rp at positions 9-10’ and Sp at positions 
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11'-12', respectively, and rescued by Mn’* to less than twofold and 
more than 20-fold, respectively. Non-bridging phosphate oxygens 
that are sensitive to sulphur substitution and Tesponsive to Mn?* 
rescue are believed to directly coordinate to Mg”, and the interaction 
stabilizes ground and transition states of the cleavage reaction to a 
similar degree. A phosphate oxygen sensitive to phosphorothioate 
substitution, but without metal ion rescue feature, such as the pro- 
Sp oxygen at the cleavage site, probably differentially stabilizes the 
transition state versus the ground state. Substituting the 10’—11’ 
pro-Sp oxygen by sulphur increases the bond length by about 
0.6 A—distance sufficient to perturb the complex network of interac- 
tions coordinated at this phosphate oxygen (Fig. 5c). The pro-Sp 
oxygen is coordinated to metal ions A and B, with A positioning the 
attacking hydroxyl ion nucleophile and B stabilizing the leaving 3’ 
oxyanion. The importance of stabilizing the leaving group was also 
documented earlier by the effects of modifications at the adjacent 2’ 
position. In contrast, the pro-Rp oxygen at the cleavage site is only 
coordinated to metal ion A, and the sulphur substitution was rescued 
with Mn°~, indicating more flexibility for positioning the nucleophile 
by metal ion A. 


Structural overview and functional implications 


Our current structures of ternary complexes with catalytic mutants of 
T. thermophilus Ago have defined the positioning of the guide DNA-— 
target RNA A-form duplex relative to the catalytic Asp residues of the 
RNase H fold of the PIWI domain, thereby establishing the molecular 
basis for site-specific cleavage at the phosphate bridging the 10’-11’ 
step of the target strand. Further structural studies of ternary com- 
plexes with wild-type Ago have identified two Mg”* cations within 
the catalytic pocket, located on either side of the cleavable phosphate, 
thereby positioned to mediate the cleavage chemistry. Both ends of 
the guide strand are anchored in the ternary complex composed of 
one turn of the DNA-RNA duplex spanning the seed segment and 
cleavage site, but consistent with a two-state model, the 3’-end is 
released from the PAZ pocket after propagation of the guide—target 
duplex by three additional base pairs. Notably, the guide DNA and 
target RNA form a regular A-form helix spanning a maximum of 15 
base pairs (positions 2-16), with the Ago scaffold undergoing pivot- 
like domain movements as the target RNA zippers up by pairing with 
its guide template. 

The kinetic effects of target site phosphorothioate substitution and 
2' modification during Ago-mediated DNA cleavage are rationalized 
by the crystal structure, and consistent with the mechanism of 
RNase H cleavage studied in other systems’. The absence of amino- 
acid side chains able to interrogate whether the target presented at the 
active site is RNA or DNA might suggest that DNA could be a more 
probable target of this bacterial Ago protein, as seen for other members 
of the retroviral integrase superfamily in which Ago proteins belong”. 


METHODS SUMMARY 


Wild-type and mutant T. thermophilus Ago proteins were overexpressed from 
Escherichia coli and purified by chromatography as described previously”’. 
Crystals were obtained by the hanging-drop or sitting-drop vapour diffusion. 
The ternary Ago complex was generated in a stepwise manner by initially mixing 
the protein with 5'-phosphorylated 21-nucleotide guide DNA, followed by addi- 
tion of different length target RNAs. All wild-type and mutant Ago complex 
structures were determined by molecular replacement using the domains of the 
binary Ago complex structure (Protein Data Bank accession code 3DLH)* as 
search models. Cleavage assays were undertaken with let-7 guide and target oli- 
gonucleotides. Details of all crystallographic and biochemical procedures are 
listed in Methods. 


Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 
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METHODS 

Crystallization and data collection. Wild-type and mutant T. thermophilus Ago 
were prepared as described previously*®. Oligodeoxynucleotides were purchased 
from Invitrogen. RNA oligonucleotides were purchased from Dharmacon. For crys- 
tallization, T. thermophilus Ago was mixed with 5'-phosphorylated 21-nucleotide 
guide DNA ata 1:1.2 molar ratio, followed by the addition of different length target 
RNAs at a 1:1 molar ratio to the binary mixture, to form the ternary complex. All 
crystals were grown at 35°C. 

The mutant Ago protein complexes were crystallized by sitting-drop vapour 
diffusion method. Crystals of catalytic mutant Ago(Asn 546) complexed with 12- 
nucleotide target RNA were grown in a reservoir containing 2.5mM spermine, 
10mM MgCl, 5mM CaCl,, 50 mM sodium cacodylate, pH 6.0, 10% (v/v) 
isopropanol. The crystals belong to space group C2, and there is one Ago com- 
plex in the asymmetric unit. Crystals of catalytic mutant Ago(Glu 546) com- 
plexed with 15-nucleotide target RNA were grown in a reservoir containing 
1.3 M ammonium tartrate dibasic and 0.1 M Bis-Tris, pH 7.0. The crystals belong 
to space group P432,2, and there is one Ago complex in the asymmetric unit. 
Crystals of the catalytic mutant Ago(Asn 478) complexed with 19-nucleotide 
target RNA were obtained in a reservoir containing 1.0 M succinic acid, 0.1M 
HEPES, pH 7.0, 1% (w/v) polyethylene glycol monomethyl ether 2,000. The 
crystals belong to space group P2,2,2;, and there are two Ago complexes in 
the asymmetric unit. 

Crystals of wild-type Ago complexed with 19-nucleotide target RNA with one 
bound divalent cation in the catalytic pocket were obtained with hanging-drop 
vapour diffusion method. The reservoir solution contained 50 mM MgCh, 1.0 M 
sodium tartrate, 50 mM Tris-HCl, pH 7.0. The crystals belong to space group 
P432,2, and there is one Ago complex in the asymmetric unit. With additional 
30 mM MgCl, in both the reservoir and Ago protein, we obtained wild-type Ago 
complexed with 19-nucleotide target RNA and two bound divalent cations in the 
catalytic pocket. These crystals belong to space group P2,2,2), and there are two 
Ago complexes in the asymmetric unit. 

Crystals ofa second crystal form of wide-type Ago complexed with 20-nucleotide 
RNA target containing adjacent mismatches at the 10-11 step was grown under the 
same conditions as described previously”’. 

Diffraction data were collected on beamline NE-CAT ID-24C at the Advanced 
Photon Source (APS), Argonne National Laboratory and beamline X-29 at the 
Brookhaven National Laboratory. All data sets were integrated and scaled with 
the HKL2000 suite’! and data processing statistics are summarized in 
Supplementary Tables 1-3. 

Structure determination and refinement. The structures of the complexes were 
solved by molecular replacement with the program PHASER”. The domains of 
the Ago 21-nucleotide guide DNA binary complex structure*® without the linkers 
were used as search models. Model building was done using COOT”, and refine- 
ment was done with CNS“ and PHENIX*. The final figures were created with 
Pymol (http://pymol.sourceforget.net/). The refinement statistics for all the Ago 
mutants and wild-type complexes are summarized in Supplementary Tables 1-3. 
Oligonucleotides and separation of isomers. Phosphorothioate-modified and 
unmodified oligodeoxynucleotides were obtained from Integrated DNA tech- 
nologies. Rp and Sp diastereomers were separated by HPLC using a Supelco 
Discovery C18 column (bonded phase silica 5 ttm particle, 250 X 4.6 mm) fol- 
lowing the general method described previously**: buffer A was 0.1 M triethylam- 
monium bicarbonate (TEAB, pH 7.5); buffer B was 40% acetonitrile in 0.1M 
TEAB; flow rate was 1 ml min |. For the preparative scale ~20 optical density 
units (ODUs) (260 nm) of oligodeoxynucleotide (that is, 20 pl of a 1 mM stock 
solution) were loaded on the column. Diastereomers of positions 8’—9', 10’-11' 
and 11'—12' were separated using a two-step gradient, 0-20% B in 2 min followed 
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by 20-40% B in 40 min (0.5% change per min). The diastereomers of positions 
9'—10' were more difficult to separate and the second step gradient was changed to 
20-40% B in 80 min (0.25% change per min). Peak 1 was shown to be 97% pure by 
analytical HPLC (same conditions as preparative run, 0.3 ODUs injected). Peak 2 
was shown to be 85% pure (Supplementary Fig. 26). Dithiothreitol (DTT) was 
added to the collected peak fractions (1 pl 100 mM DTT to about 2 ml fraction) 
before dry down to minimize oxidation of the phosphorothioate. Co-evaporation 
with methanol was repeated three times to remove residual TEAB buffer. The 
dried-down material was resuspended in 50 1 water and ethanol precipitated to 
remove DTT. In each case peak 1 is the Rp form and peak2 is the Sp form, 
consistent with ref. 47. The identity of the purified diastereomers was confirmed 
by snake venom phosphodiesterase/alkaline phosphatase treatment and sub- 
sequent HPLC; the Sp-configured dinucleotide was more resistant to phospho- 
diesterase compared to the Rp-configured dinucleotide** (Supplementary Fig. 
27). HPLC for separation of nucleosides and dinucleotide phosphorothioates 
used buffer A as 0.1 M triethylammonium acetate in 3% acetonitrile, and buffer 
B as 90% aqueous acetonitrile. The elution was performed using a stepwise gra- 
dient starting at 0% B for 15.5 min, followed by 19.5 min of 10% B and 30 min of 
100% B ata flow rate of 0.5 ml min |, using the same HPLC column as indicated 
earlier. The first four peaks are digested monomers, consistent for each oligo- 
deoxynucleotide. Later peaks (after 30 min) are undigested dimers along with 
some baseline noise. Elution times of these later peaks depended on the dimer 
sequence and phosphorothioate configuration. In each case the Rp form was more 
digested than the Sp form (Supplementary Fig. 27). 

Cleavage activity assay of T. thermophilus Ago. Recombinant T. thermophilus 
Ago (0.5 1M final concentration) was incubated with a reaction mixture con- 
taining 10 mM HEPES-KOH, pH 7.5, 100 mM NaCl, 0.5 1M guide strand, and 
5 mM of CaCl,, MgCl, or MnCl, for 30 min at 55 °C ina volume of 10 pl. Then, 
5'~°*p-labelled DNA target was added to obtain the indicated final concentra- 
tions. For single turnover conditions (0.5 [1M target strand) or multiple turnover 
conditions (5 1M target strand), unlabelled DNA target was spiked with radio- 
active target at a concentration of approximately 0.01 1M. The incubation was 
continued at 75 °C in a total volume of 15 pl. The reaction was stopped by the 
addition of 15 l Stop solution (95% formamide, 50mM EDTA and 0.02% 
bromophenol blue). The cleavage products were resolved on a 12% denaturing 
polyacrylamide gel, and radioactivity was monitored by phosphoimaging. 
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Role of the polycomb protein EED in the 
propagation of repressive histone marks 


Raphael Margueron'*, Neil Justin?*, Katsuhito Ohno**, Miriam L. Sharpe?*, Jinsook Son’, William J. Drury III’, 
Philipp Voigt’, Stephen R. Martin, William R. Taylor’, Valeria De Marco’, Vincenzo Pirrotta®, Danny Reinberg’ 
& Steven J. Gamblin? 


Polycomb group proteins have an essential role in the epigenetic maintenance of repressive chromatin states. The 
gene-silencing activity of the Polycomb repressive complex 2 (PRC2) depends on its ability to trimethylate lysine 27 of 
histone H3 (H3K27) by the catalytic SET domain of the EZH2 subunit, and at least two other subunits of the complex: SUZ12 
and EED. Here we show that the carboxy-terminal domain of EED specifically binds to histone tails carrying trimethyl-lysine 
residues associated with repressive chromatin marks, and that this leads to the allosteric activation of the methyltransferase 
activity of PRC2. Mutations in EED that prevent it from recognizing repressive trimethyl-lysine marks abolish the activation 
of PRC2 in vitro and, in Drosophila, reduce global methylation and disrupt development. These findings suggest a model for 
the propagation of the H3K27me3 mark that accounts for the maintenance of repressive chromatin domains and for the 


transmission of a histone modification from mother to daughter cells. 


The fate ofa cell is specified by its gene expression profile, often set early 
in development and maintained throughout the lifetime of the cell by 
epigenetic mechanisms. The polycomb group of proteins functions by 
silencing inappropriate expression by maintaining a repressive epi- 
genetic state’. It is thought that the PRC2-mediated trimethylation 
of lysine 27 on histone H3 (H3K27me3) has a crucial role in marking 
repressive chromatin domains, whereas PRC1 is important for effect- 
ing transcriptional repression. Thus, once established, H3K27 tri- 
methylation is the epigenetic mark for maintaining transcriptional 
repression. Mechanisms are therefore required to maintain this mark 
in repressed chromatin domains in non-dividing cells and to restore it 
after the twofold dilution caused by DNA replication in dividing cells. 
However, it is not yet clear how PRC2 complexes recognize previously 
marked sites and how they accurately propagate these repressive marks 
to unmodified nucleosomes deposited during DNA replication. 

The histone lysine methyltransferase (HKMT) activity of the 
PRC2 complex resides in the SET-domain-containing protein 
EZH2 (refs 2-5), but activity requires the other subunits of the core 
complex; the zinc-finger-containing SUZ12 and the WD40 repeat 
proteins EED and RbAp48 (also known as CAF1). In certain contexts, 
the PHD-domain-containing protein PHF1 plays an important part 
in modulating the HKMT activity of PRC2 (refs 6, 7). In this work 
we have examined the structure and biochemistry of EED, and deter- 
mined the role of its homologue ESC in Drosophila development. 
From this we have established that the EED subunit of PRC2 binds 
to repressive methyl-lysine marks, ensuring the propagation of 
H3K27 trimethylation on nucleosomes by allosterically activating 
the methyltransferase activity of the complex (see Supplementary 
Fig. 1). 


The aromatic cage of EED binds repressive chromatin marks 


We crystallized a truncated version of EED (residues 77 to 441, hereafter 
AEED) and used selenomethionine-substituted AEED to solve the 


structure. The WD40-repeats of AEED fold into a seven-bladed -pro- 
peller domain with a central pocket on either end (Fig. 1), as seen 
previously*. We noticed unaccounted electron density in one of these 
pockets; our crystallization mixture included a non-detergent sulpho- 
betaine additive, NDSB-195, which we were able to build into the extra 
electron density. Because the quarternary amine of the sulphobetaine 
resembled a trimethylated lysine side chain? we reasoned that EED 
might bind to trimethylated lysine residues on the N-terminal tails of 
histones. 

Histone lysine residues methylated in vivo include H3K4, H3K9, 
H3K27, H3K36, H3K79, H4K20 and H1K26. We measured the binding 
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Figure 1| Trimethyl-lysine binding to an aromatic cage on EED. Ribbons 
representation of the EED-H3K27me3 complex, in which EED is in grey and 
the histone peptide is in yellow with its methyl-lysine side chain shown in 
stick representation. The Cx positions of the aromatic cage are shown as blue 
circles, and the Ca position of Tyr 358 by a red circle. The right panel shows 
the methyl-lysine-binding site with 2F, — F, electron density for the four 
cage residues and the H3K27me3 peptide. Designed mutations to the cage 
are shown in red in parentheses. The side chain of Met 256 is also shown; this 
is equivalent to Met 236 in ESC, which has been identified from classical 
genetic screens in Drosophila as essential for the function of EED. 
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affinity of AEED to trimethylated versions of these lysine residues using 
synthetic peptides by fluorescence competition assays. AEED bound to 
H1K26me3, H3K9me3, H3K27me3 and H4K20me3 peptides with 
dissociation constant (Ky) values ranging from 10 to 45 1M, and the 
binding became approximately fourfold weaker for each successive loss 
of a methyl group from the methyl-lysine (Supplementary Table 1). 
Notably, AEED did not bind appreciably to H3K4me3, H3K36me3 or 
H3K79me3 ‘marks’ associated with active transcription’. We validated 
these results by isothermal titration calorimetry (Supplementary Table 
1 and Supplementary Fig. 2b), and there is good agreement between the 
two independent methods. 

Next, we solved the structure of AEED co-crystallized with 
H1k26me3, H3K9me3, H3K27me3 and H4K20me3 peptides (Sup- 
plementary Table 2 and Supplementary Fig. 3). The peptides in the 
four co-crystal structures adopt similar, largely extended structures 
and all exploit the aromatic cage of AEED to recognize the trimethyl- 
lysine residue (Fig. 1 and Supplementary Fig. 4). This is the first 
example of such a binding site on a $-propeller domain and it consists 
of three aromatic side-chains, Phe 97, Tyr 148 and Tyr 365 (Fig. 1). 
The trimethyl-ammonium group of the lysine is inserted into this cage 
and is stabilized by van der Waals and cation—n interactions. A fourth 
aromatic side-chain (Trp 364) interacts with the aliphatic moiety of 
the lysine side chain by hydrophobic interactions (Figs 1, 2 and 
Supplementary Fig. 5). Adjacent to the methyl-lysine pocket, AEED 
makes two hydrogen-bond interactions with carbonyls on the 
peptides (Fig. 2a). First, the main-chain carbonyl of the methyl-lysine 
residue forms hydrogen bonds with the side chain of Arg 414. Second, 
the main-chain carbonyl of the residue immediately amino-terminal 
of the methyl-lysine on the peptide makes a hydrogen bond with the 
main-chain amide of Trp 364. The residues flanking the methyl-lysine 
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Figure 2 | Interactions between EED and trimethylated histone peptides. 
a, Schematic representation of the interaction between EED and H3K27me3. 
For clarity, the aromatic methyl-lysine-binding cage has been omitted and the 
methylated lysine side-chain shown as a yellow circle. Hydrogen bonds from 
the main-chain carbonyl of the methyl-lysine with EED, and the residue 
immediately N-terminal to it, are shown as dashed lines. The green hatching 
indicates the hydrophobic pocket on EED that accommodates the alanine side 
chain two residues before the methyl-lysine. b, EED is shown in surface 
representation with a composite of two of its cognate peptides shown in stick 
representation and coloured yellow (H3K9me3) and pink (H4K20me3). 

c, The pocket on EED that accommodates Ala (—2) from the H3K9me3 
peptide is shown. d, The other pocket that contains a Leu (+2) from the 
H4K20me3 peptide. The EED surface is coloured according to atom type. 
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residue, at the —1 and +1 positions, are oriented away from the 
protein, whereas the next residues, at the —2 and +2 positions, make 
important contacts (Fig. 2). Comparison of the four complexes (Fig. 2 
and Supplementary Figs 2A and 4) suggests an important role for two 
distinct hydrophobic interaction sites (Fig. 2b). H1K26, H3K9 and 
H3K27 each have an alanine residue two amino acids N-terminal to 
the lysine (—2), which fits into a small pocket on the surface of EED 
formed by the hydrophobic moieties of Trp 364, Tyr 308 and Cys 324 
(Fig. 2c). The size of this pocket is sufficient to accommodate an 
alanine residue but not larger hydrophobic residues. In the case of 
H4K20 peptide—the only one of the four that bound to AEED and 
lacks an alanine at —2—its binding is facilitated by an alternative 
hydrophobic interaction between the leucine residue in the +2 posi- 
tion of the peptide with a second hydrophobic pocket formed by 
residues Ile363, Ala412 and the y-carbon of Gln382 of EED 
(Fig. 2d). It seems that the ability to exploit one of these two small 
hydrophobic pockets is an important component of the specificity of 
EED towards the methyl-lysine marks associated with repressive chro- 
matin. However, the affinity of EED for these modified peptides is 
relatively modest, and it is likely that this interaction only becomes 
physiologically relevant in association with the histone-binding 
activity of other components of the PRC2 complex, as suggested by 
earlier work on Drosophila PRC2 (ref. 11). 

To probe the physiological role of the aromatic cage of EED, site- 
directed mutants of several of the cage residues were created. 
Mutations of Phe97, Trp 364 and Tyr365 to alanine produced 
well-behaved protein, and competition experiments showed that 
the Trp364Ala and Tyr365Ala mutations had no detectable binding 
to H1K26me3 peptides, whereas AEED Phe97Ala bound about eight- 
fold more weakly than wild-type AEED to histone peptides (Sup- 
plementary Table 1). As a control for the effect of mutation of an 
aromatic residue on the EED structure that is not involved in the 
aromatic cage, we also generated the mutation Tyr358Ala (Fig. 1); 
binding by this mutant was reduced by about twofold (Supplemen- 
tary Table 1). 


PRC2-EZH2 and EED specifically bind H3K27me3 nucleosomes 

Next, we used nucleosome arrays reconstituted with chemically 
modified histones that carry a single modification of the four possible 
methylation states of H3K27, H3K36 or H3K9 (ref. 12). The nucleo- 
some arrays were incubated with full-length His-tagged EED protein 
followed by nickel-nitrilotriacetic acid (Ni-NTA) pull-down assays. 
Western blotting for H3 and EED demonstrated an interaction 
between EED and nucleosomes containing H3K27me3 (Fig. 3). This 
interaction was specific as EED was not able to interact with chromatin 
reconstituted with histones containing the different levels of H3K36 
methylation (Supplementary Fig. 8), but did bind to chromatin tri- 
methylated on H3K9 (data not shown). Interestingly, the truncated 
AEED protein tested in the peptide-binding experiments also failed to 
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Figure 3 | EED and PRC2 interaction with chromatin. Pull-down 
experiment to analyse the interaction between EED, PRC2—EZH2 wild type 
or reconstituted with EED(Tyr365Ala) and H3K27-modified chromatin 
(left), or between PRC2—EZH2 wild type and H3K9-modified chromatin 
(right). Nuc, nucleosomes; un, unmethylated. Note that the ‘beads only’ 
control for interaction with H3K9-modified chromatin is not shown but was 
identical to the control shown in the figure. 
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interact with nucleosomes (Supplementary Fig. 8). Presumably, the 
diminished binding is due to the absence ofa previously characterized 
H3-binding site within the N terminus of EED'’, which may act 
together with the methyl-lysine-binding site to achieve stable binding. 

Given that other subunits of PRC2 contact histones and thus modu- 
late chromatin binding, we repeated the nucleosome-binding experi- 
ment using a PRC2 complex purified from insect cells co-infected with 
baculovirus expressing each of the subunits (Fig. 4a, left). Although, as 
expected, the reconstituted PRC2 complex showed some binding to 
unmodified chromatin, the complex bound considerably tighter to 
chromatin carrying the H3K27me3 or the H3K9me3 modification 
(Fig. 3). Interestingly, PRC2 reconstituted with EED carrying the 
Phe97Ala or Tyr365Ala substitution does not show binding to chro- 
matin under these conditions, with either methylated or unmodified 
nucleosomes (Fig. 3 and data not shown). Together, our results 
demonstrate that the aromatic cage in EED is critical for the PRC2 
complex to bind to repressive marks, through its specific recognition 
of defined (repressive) trimethylated-lysine residues. 


Trimethylated repressive marks stimulate PRC2 activity 


Because a probable function for the binding of PRC2 to trimethylated 
lysine would be to contribute to the propagation of the H3K27me3 
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Figure 4 | Peptide mimicking repressive marks stimulates PRC2 activity. 
a, Left, Coomassie blue staining of reconstituted PRC2—EZH2 complex. 
Right, HKMT assay with PRC2—EZH2 alone or in the presence of 10 and 
40 uM H3K27, unmodified, mono, di or trimethylated peptides. b, Titration 
of the methyl donor (S-adenosyl-methionine) in the presence of H3K27me0/ 
1/2/3 peptides. d.p.m., disintegration per min. c, Nucleosome titration in the 
presence of H3K27me0/3 peptides. d, Left, coomassie blue staining of 
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mark, we performed HKMT assays using recombinant oligonucleo- 
somes in the presence of methylated peptides. The addition of 
unmodified or monomethylated H3K27 peptides did not signifi- 
cantly affect the enzymatic activity of PRC2, but trimethylated pep- 
tides activated it by about sevenfold (Fig. 4a). Stimulation of enzym- 
atic activity by the H3K27me3 peptide reached a plateau around 
100 uM, and half-maximum stimulation is achieved at 30-40 uM 
(Supplementary Fig. 7), which is in good agreement with the dissoci- 
ation constant determined for AEED and the H3K27me3 peptide 
(Supplementary Table 1) and gives us strong confidence that the 
binding event we observe with purified, truncated EED is closely 
correlated with the allosteric activation mechanism. We also deter- 
mined the Michaelis parameters for PRC2 in the presence of 
variously methylated H3K27 peptides (Fig. 4b, c). During titrations 
of S-adenosyl-methionine (SAM) we observed a marked increase in 
the maximum reaction rate (V,,) in the presence of the H3K27me3 
peptide. A similar result was observed with titration of nucleosomes 
(Fig. 4c). Notably, in both cases the substrate concentration required 
to achieve the half-maximal reaction rate (K,,) is not significantly 
affected by the incubation with peptides. 

To ascertain whether the observed stimulation was EED-mediated, 
mutant PRC2 complexes containing EED(Phe97Ala) or EED 
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reconstituted PRC2—EZH2 EED(Tyr365Ala) complex. Right, HKMT assay 
with the corresponding complex in the same condition as a. e, Relative PRC2 
histone methyltransferase activity in the presence of various peptides as 
indicated. f, Table indicating the peptides used for the stimulation study as 
well as their Ky values (1M) for AEED binding. g, Relative PRC2 histone 
methyltransferase activity in the presence of various peptides as indicated. 
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(Tyr365Ala) were reconstituted (Fig. 4d and Supplementary Fig. 8). 
These mutant recombinant PRC2 complexes retain a similar basal 
activity to wild-type, but neither mutant recombinant PRC2 was sti- 
mulated by the addition of H3K27me3 peptides (Fig. 4d). Our data 
also show that the H3K9me3, H4K20me3 and H1K26me3 peptides 
were all able to stimulate PRC2 activity to some extent, whereas the 
H3K4me3 and H3K36me3 peptides were ineffectual (Fig. 4e). 
However, we noticed that the binding affinity to EED and stimulation 
of PRC2 activity do not strictly correlate (that is, H3K9me3 has a good 
binding affinity for EED but stimulates PRC2 activity relatively 
poorly). To investigate the role of histone sequence in binding/activa- 
tion we first mutated the arginine residue at the — 1 position—present 
in all four histone peptides that activate the methyltransferase activity 
of PRC2—to alanine (Arg26Ala of H3K27me3 in Fig. 4f, g). 
Remarkably, although the binding of this mutant peptide to AEED is 
only reduced by about 1.5-fold (Fig. 4f), it is no longer able to activate 
PRC2 HKMT activity (Fig. 4g), demonstrating that repressive-histone- 
peptide binding to the aromatic cage of EED is necessary, but not 
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sufficient for PRC2 activation. To further test this model we made a 
series of chimaeric and mutant peptides that show that the lysine at —4, 
the alanine at —3 and the arginine at — 1 are not important for binding 
to EED but are key to the activation of PRC2. We propose that these are 
the residues that mediate an interaction with another part of the PRC2 
complex that leads to its activation. 


PRC2 function requires recognition of repressive marks 


To evaluate the importance of EED binding to trimethylated marks in 
vivo, we turned to ESC, the EED homologue in Drosophila, and tested 
the effect of mutating its aromatic cage. We reconstituted the 
Drosophila PRC2 complex and showed that addition of H3K27me2/ 
3 peptides to the HKMT assay resulted in a robust stimulation of PRC2 
enzymatic activity (Fig. 5a). ESC is required throughout development, 
but in the early embryo the maternal stock of esc product is critical, as 
evidenced by the resultant derepression of homeotic genes in embryos 
produced by esc mothers'*”*. At later stages of development, PRC2 
activity is sustained through the overlapping participation of ESC and 
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Figure 5 | The aromatic cage in Drosophila ESC is important for its in vivo 
function. a, Left, Coomassie blue staining of reconstituted dPRC2-E(Z) 
complex. Right, HKMT assay with dPRC2-E(Z) and H3K27me0/1/2/3 
peptides. b, Top, amino acid residues Phe 77, Tyr 338 and Phe 345 that were 
mutated to Ala in Drosophila ESC and the corresponding residues in EED. 
Bottom, schematic representation of transposon constructs. ¢, Rescue 
experiment. See Supplementary Figs 9 and 10 for crossing scheme details. 
Several independent lines were examined for each transgenic construct and 
showed the same phenotype except for one line of Myc-ESC(Tyr338Ala). In 
the case of Myc-ESC(Phe345Ala), transgenes were inserted at @C31 att sites 
at 68E and 86Fb, respectively. For direct comparison, wild-type Myc—ESC 
lines were also established at the same chromosomal location and showed 
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the same results as lines established by conventional P-element 
transformation. d, Immunoprecipitation using ovarian extracts. ESC 
aromatic cage mutation does not impair binding to E(Z). The double 
Myc-ESC bands are caused by phosphorylation of ESC. e, Scheme indicating 
the genomic location of primers used for ChIP. f, ChIP analysis of E(Z) 
binding to the bxd PRE (FM4) in homozygous esc’ escl*°'>"* expressing wild- 
type or aromatic cage mutant Myc—ESCs. yw indicates the wild-type stock 
with endogenous wild-type ESC and ESCL. g, ChIP analysis of the 
H3K27me3 distribution at four sites in the Ubx gene. h, Histone H3K27 
methylation in esc° escl*°!>'4 double-mutant larvae expressing wild-type or 
aromatic cage mutant Myc—ESCs. 
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its close homologue ESCL'*’*. Overexpression of ESC in the ovaries 
(for example, in a female with one extra esc copy) can supply enough 
function to allow the development of esc embryos, producing flies that 
are virtually normal except for the eponymous extra sex combs in 
males. We constructed mutations affecting the aromatic cage: 
Phe77Ala (equivalent to Phe 97 in EED) and Phe345Ala (equivalent 
to Tyr 365 in EED), as well as Tyr338Ala (equivalent to Tyr358Ala in 
EED) just preceding the aromatic cage (Fig. 5b), and expressed Myc- 
tagged wild-type or mutant esc transgenes under the control of the esc 
promoter (Fig. 5b). Although the wild-type transgene rescued the extra 
sex comb phenotype almost completely (217 out of 218 males 
counted), the aromatic cage mutant transgenes were ineffectual (no 
rescue in several hundred males examined) (Fig. 5c). Flies lacking both 
zygotic ESC and ESCL in these crosses produce larvae with poorly 
developed brain and imaginal discs, which die when they pupate. 
This lethality is completely rescued by one copy of the wild-type 
esc>Myc-ESC transgene. In contrast, none of the aromatic cage 
mutant transgenes were able to rescue the lethality even when present 
in two copies (Fig. 5c), although the esc>Myc-ESC(Phe77Ala) trans- 
gene alleviated the brain and imaginal disc phenotypes (data not 
shown). Of note, zygotic expression of the Phe345Ala transgene 
impaired the contribution of wild-type esc indicating that this mutant 
acts as a dominant negative. The failure of the mutant Myc—ESC to 
rescue is not due to instability or the inability to be incorporated into a 
PRC2 complex: the ESC mutants were expressed at levels comparable 
to that of the wild type. Furthermore, immunoprecipitation experi- 
ments showed that the mutant ESCs co-immunoprecipitated with 
endogenous E(Z) as efficiently as the wild-type protein (Fig. 5d). To 
determine whether the mutant ESCs affected PRC2 function with 
respect to its gene targeting or activity, we performed chromatin 
immunoprecipitation (ChIP) followed by quantitative PCR with the 
primer sets indicated in Fig. 5e. Immunoprecipitation using anti-E(Z) 
shows that wild-type Myc-ESC is nearly as effective as endogenous ESC 
(compare with the esc’ escl” chromatin), whereas PRC2 complex with 
Myc—ESC-bearing mutations in the aromatic pocket is recruited less 
efficiently to the Ubx polycomb response element (PRE) (Fig. 5f). 
Chromatin immunoprecipitation with anti-H3K27me3 antibodies 
also shows that wild-type Myc—ESC is nearly as effective as endogenous 
ESC (yw) in trimethylating H3K27 in the Ubx upstream enhancer 
region (PBX, —30 kilobases (kb)), in the vicinity of the PRE (FM1, 
FM6, —23kb) or at the Ubx promoter. Notably, the mutant ESCs 
are deficient in the extent of H3K27me3 (Fig. 5g), and this decrease 
correlates with the phenotypes described in Fig. 5c. Importantly, the 
observed effects are due to the aromatic cage, as a mutation of 
Tyr338Ala, which is not important for cage formation, had no effect 
(Fig. 5g). Finally, we analysed the global levels of H3K27 methylation 
by western blot (Fig. 5h). We observed an almost complete loss of 
H3K27me3 in extracts from esc escl” larvae expressing the mutant 
ESCs. Perhaps surprisingly, the H3K27me2 levels were equally strongly 
affected. 


Discussion 


Chromatin domains are distinguished by the presence ofa characteristic 
set of marks. When these marks are used to sustain an epigenetic state, 
eukaryotic cells must have the means of propagating these marks 
through cellular division and of ensuring that they obey appropriate 
boundaries during development. That PRC2 might recognize the chro- 
matin mark it sets was anticipated by Hansen et al.’ who reported that 
PRC2 binds to H3K27me3, although this study did not address the 
mechanism for the propagation of H3K27me3. Our work shows the 
structural and functional basis for epigenetic self-renewal, and leads us 
to conclude that PRC2 readout of H3K27me3 (and to a lesser extent 
other ‘repressive’ marks) is key to the propagation of this repressive 
mark. 

A combination of aromatic and hydrophobic residues is commonly 
used by proteins that recognize methylated lysine residues and has been 
found in chromo-, tudor- and plant homeo-domains (PHD)'*”°, but 
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no such arrangement has previously been described for any WD40- 
repeat-containing protein (for example, Supplementary Fig. 7 and 
ref. 21). Sequence analysis across the family of B-propeller domains 
leads us to conclude that the ability of EED to specifically recognize 
repressive methyl-lysine marks is a feature, limited among WD40 pro- 
teins, to EED-related molecules (Supplementary Fig. 6). 

This methyl-lysine interaction provides an extra contribution to 
nucleosome binding that is mainly driven by a combination of contacts 
from other subunits of PRC2; RbAp48 binds to histone H4 (refs 22, 23) 
and the N-terminal domain of EED binds to H3 (ref. 13), and it may 
well be that these different interactions act cooperatively. In Drosophila, 
recruitment of PRC2 may also be facilitated by certain DNA-binding 
factors**”. Our Drosophila experiments show that when the Drosophila 
EED orthologue ESC bears mutations in the aromatic cage, the recruit- 
ment of PRC2 to the PRE is less effective, as shown by the drop in E(Z) 
binding to the bxd PRE, the massive reduction in the global level of 
H3K27me2/3 and by the phenotype of the Phe77Ala and Phe345Ala 
mutants. Our chromatin modification assays suggest that a major 
effect of EED binding to repressive methyl-lysine marks is the stimu- 
lation of PRC2 methyltransferase activity, thus providing a mecha- 
nism for the propagation of this mark. Thus, when PRC2 is recruited 
to appropriate chromatin domains, the presence of pre-existing 
H3K27me3 marks on neighbouring nucleosomes activates the com- 
plex to carry out further methylation of unmodified H3K27 (Sup- 
plementary Fig. 1). Accordingly, a polycomb group target gene that 
had been repressed in one cell cycle will tend to be repressed again in 
the next cell cycle, and previously active genes will be left unmodified 
at H3K27. We propose that the ability to recognize a previously 
established mark that triggers its renewal is a feature that will be found 
in other epigenetic mechanisms mediated by histone modifications. 


METHODS SUMMARY 


N-terminally truncated EED (AEED) was expressed as a_ glutathione 
S-transferase (GST) fusion protein in Escherichia coli. Crystals were grown by 
the hanging drop method using 4.0 M sodium formate as a precipitant, together 
with NDSB-195 or as a complex with histone peptides. Diffraction data were 
processed using Denzo and Scalepack, the native structure was solved by single 
wavelength anomalous dispersion (SAD) and built by ARP/wARP. Histone 
complexes were solved by molecular replacement and refined using Refmac5 
with manual model building using O or Coot. The affinity of wild-type EED for 
histone peptides was determined by competition fluorescence spectroscopy or 
isothermal titration calorimetry (ITC). Fluorescence spectroscopy was per- 
formed at 20°C using a SPEX FluoroMax fluorimeter; dansyl-labelled peptides 
bound to EED were competed off by excess cold peptide and the change in 
fluorescence was monitored. ITC reactions were performed at 20°C and used 
to verify EED wild-type binding and also to determine the affinity of EED 
mutants to trimethylated histone peptides. Recombinant PRC2 complexes were 
purified from insect cells after infection with baculovirus. Chromatin for inter- 
action and HKMT assay was refolded by salt dialysis. Histone H3 carrying the 
Cys110Ala mutation was chemically modified using the method described prev- 
iously’*. ESC transgene construction, esc and escl mutant fly lines have been 
described previously’®. Chromatin immunoprecipitation and analysis by quant- 
itative PCR were done according to refs 26 and 27. 


Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 
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METHODS 

Protein expression and purification. Residues 78-441 of EED (AEED) were 
cloned into pGEX-4T vector (Amersham Biosciences) and expressed in E. coli. 
Proteins were prepared as N-terminal GST-fusion proteins and cleaved from 
GST with human o-thrombin (Haematologic Technologies, Inc.). Proteins were 
purified further using size-exclusion chromatography (Superdex 200, GE 
Healthcare) in buffer containing 50 mM Tris-HCl, pH 8.7, 150 mM NaCl and 
3mM TCEP. Site-directed mutants of AEED were generated with the ExSite 
protocol (Stratagene) and purified in a similar manner. Crystallographic and 
binding studies were carried out using a construct containing the mutation 
Met370Thr, however the binding properties of this construct are identical to 
those of the ‘wild-type’ construct. Peptides were synthesized and purified by 
reversed phase HPLC at the University of Bristol Peptide Synthesis Facility. 
Peptide masses were verified by mass spectrometry. 

Crystallography. For crystallization trials, protein solutions were prepared 
either as AEED alone at 2.5 mgml ! or as a complex solution at 1.5mgml _! 
with peptide at a sevenfold higher molar ratio. All protein solutions contained 
TCEP at 15mM concentration. Crystals were grown at 18°C using the vapour 
diffusion technique in hanging drops. Drops were prepared by mixing equal 
volumes of AEED protein alone with reservoir solution containing 4.0—4.1 M 
formate and 0.6—0.7 M NDSB-195, or by mixing equal volumes of AEED protein 
complex with 3.7-3.9 M formate solution. Crystals were transferred into mother 
liquor with 5-10% glycerol before flash cooling in liquid nitrogen. Diffraction 
data for the AEED-only native and selenomethionine crystals were collected at 
the Daresbury synchrotron on beamline 10.1 at the peak wavelength for 
selenium. Diffraction data for the HI1K26me3, H3K9me3 and H4K20me3 
protein complex crystals were collected using an in-house MicroMax 007HF 
rotating anode coupled to a RaxisIV*~* detector. Data for H3K27me3 was col- 
lected at Diamond Light Source on beamline 104 at a wavelength of 0.97 A. Data 
were integrated using Denzo and scaled with Scalepack”*. Phases for the seleno- 
methionine-substitued AEED structure were generated and extended using the 
SAD method and SOLVE” and RESOLVE” programs. Phases from RESOLVE 
were used to autobuild a model with ARP/wARP”' in warpNtrace mode. The 
protein complex crystal structures were solved by molecular replacement using 
AMoRe” and the selenomethionine-substitued AEED structure as the search 
model. Standard refinement was carried out with refmac5 (ref. 33) and CNS** 
together with manual model building with O* and Coot’®. Figures were created 
with Pymol (DeLano Scientific; http://pymol.sourceforge.net/). 

Binding studies. Histone-peptide-binding experiments were performed by com- 
petition fluorescence spectroscopy and ITC. All fluorescence emission spectra were 
measured using a dansyl-labelled peptide (sequence: KKKARK(Me3)SAGAAK- 
dansyl) at 20°C in 50mM Tris-HCl, pH 8.7, 150 mM NaCl and 3mM TCEP. 
Measurements were recorded using a SPEX FluoroMax fluorimeter (excitation 
wavelength 330 nm, emission wavelength 537 nm). Binding of dansyl peptide to 
EED was monitored by titrating excess EED into 5 uM peptide. Dissociation con- 
stants for the unlabelled histone peptides were determined using a competition 
assay by adding excess unlabelled peptide to a complex of 35 [1M EED with 35 uM 
dansyl peptide and monitoring the subsequent reduction in fluorescence. ITC 
measurements were carried out by injecting peptide at 400—-1,000 uM into the 
ITC cell containing A77EED at 40-100 1M. Experiments were performed at 
20°C in 50mM Tris-HCl, pH 8.7, 150mM NaCl and 3 mM £-mercaptoethanol 
(BME). 

Methyl lysine analogue production. Pseudo-lysine (@K)-containing histones 
were generated by a modification of known literature methods'’. In brief, 
proteins to be modified (5-10mg) were weighed into 1.5-ml siliconized 
Eppendorf microcentrifuge tubes and 950 pl alkylation buffer (4M guanadi- 
nium chloride, 1 M HEPES, 10mM D/t-methionine at pH 7.8) the solution is 
passed through a 0.22-1m filter and purged with argon before use) was added. 
Proteins that do not readily dissolve were sonicated for 10-15 min in a Branson 
1510 ultrasonic cleaning bath at ambient temperature to affect dissolution. The 
resultant clear colourless solutions were treated with 20 pl of a 1 M dithiothreitol 
(DTT) solution in alkylation buffer prepared just before use, and agitated at 
37 °C for 1h. At the end of this period the fully reduced proteins were treated as 
indicated below. 

(1) Pseudo-lysine (@K-NH;): 100 pil of a 1 M 2-chloroethylamine monohy- 
drochloride solution in alkylation buffer (prepared just before use) was added to 
the reduced histone. The mixture was agitated in the dark at 45 °C for 2.5h. At 
the end of this period the mixture was treated with a second portion of DTT 
(10 pl of the above 1 M solution) and heated with agitation at 45 °C for a further 
2.5 h. The reaction was then quenched with BME (50 ul) and cooled to room 
temperature before purification as outlined below. 

(2) Pseudo-monomethyl-lysine (pK-Mel): 100 pl ofa 1 M N-methylaminoethy! 
chloride hydrochloride solution in alkylation buffer (prepared just before use) was 


nature 


added to the reduced histone. The mixture was agitated in the dark at 45 °C for 2.5 h. 
At the end of this period the mixture was treated with a second portion of DTT 
(10 pl of the above 1 M solution) and heated with agitation at 45 °C for a further 2.5 
h. The reaction was then quenched with BME (50 ul) and cooled to room temper- 
ature before purification as outlined below. 

(3) Pseudo-dimethyl-lysine (@K-Me2): 50 pl of a 1 M 2-(dimethylamine)ethyl 
chloride hydrochloride solution in alkylation buffer (prepared just before use) 
was added to the reduced histone. The mixture was agitated in the dark at 25 °C 
for 2h. At the end of this period the mixture was treated with more DTT (10 pl of 
the above 1 M solution) and agitated at 25 °C for 30 min before addition of 50 pl of 
the 1M 2-(dimethylamine)ethy! chloride hydrochloride solution. The reaction 
was allowed to proceed at ambient temperature for a further 2h, quenched with 
BME (50 ul), and cooled to room temperature before purification as outlined 
below. 

(4) Pseudo-trimethyl-lysine (@K-Me3): 100 pl of a 1M (2-bromoethyl) 
trimethyl-ammonium bromide solution in alkylation buffer (prepared just 
before use) was added to the reduced histone. The mixture was agitated in the 
dark at 50 °C for 2.5h. At the end of this period the mixture was treated with a 
second portion of DTT (10 ul of the above 1 M solution) and heated with agita- 
tion at 50 °C for an extra 2.5 h. The reaction was then quenched with BME (50 ul) 
and cooled to room temperature before purification as outlined below. 

Purification scheme: A PD-10 column was pre-equilibrated with 0.1% BME in 
18 Q water. This was loaded with the reaction mixture, the reaction tube was rinsed 
with | ml alkylation buffer and this was also added to the top of the column. The 
proteins were then eluted according to the manufacturer’s protocol for centrifugal 
isolation. The eluent was frozen and lyophilized before providing the modified 
histones as crispy foams. A portion of each (~0.1 mg) was analysed by reverse- 
phase-HPLC and matrix-assisted laser desorption/ionization-time of flight 
(MALDI-TOF) mass spectrometry to ensure product identity and homogeneity. 
Chromatin and interaction experiment. Histone H3 variants with the respective 
point mutations (Lys to Cys at the position to be modified, and Cys to Ala at 
position 110) were expressed in E. coli, purified from inclusion bodies, and 
solubilized in 7M guanidine hydrochloride, 20mM Tris, pH 8, 10mM DTT. 
After dialysis to replace guanidine hydrochloride with 7 M urea, histones were 
further purified by sequential anion and cation chromatography. Histone- 
containing fractions were pooled, dialysed against 5 mM BME, and lyophilized. 
Histones were reconstituted into octamer as previously described*’ and chro- 
matin was formed by salt dialysis. To prevent unspecific binding to free histone 
in the pull-down experiment, chromatin was further purified on an agarose2 
column. For interaction, 2 ug chromatin was incubated with 2 1g protein or 
complex of interest in buffer A (50mM Tris, pH8.0, 50mM NaCl, 1mM 
EDTA, 0.01% NP40) for 2h at 4°C in the presence of Ni-NTA beads (EED) or 
M2-beads (PRC2). Beads were extensively washed, eluted with 1x SDS-PAGE 
loading buffer and analysed by western blot. 

HKMT assay. HKMT assays were performed as previously described**. For 
autoradiography exposure, the conditions were as follows: 1.5 ug chromatin, 
100 ng reconstituted PRC2 complex, 5-40 1M peptide and 0.3 1M °H-SAM. 
For scintillation counting, the assay was performed as follows: 1.5 ug chromatin, 
50 ng reconstituted PRC2 complex, 100 1M peptide, 24.8 1M SAM (*H-SAM/ 
SAM ratio 1/30) for 15 min, unless otherwise stated in the figure legend. 

SF9 culture, infection and complex purification. As previously described”*. 
Antibodies. H3K27mel (Millipore), H3K27me2 (Abcam, ab24684 and ab6002), 
H3K27me2 and H3K27me3 (gift from T. Jenuwein), total H3 (Abcam, ab1791), 
Flag (Sigma), Myc 9E10 (Chemicon). Previously described antibodies were used 
for EED** and E(Z)*'. 

Fly strains and mutants. The Df(1)y'w°’? strain (yw) was used as wild-type 
control and for P-mediated germ-line transformation. For transgene insertion at 
specific genomic sites, we used fly strains in which the @C31 integrase gene is 
inserted on the X chromosome and attP landing sites are located in 68E or 86Fb, 
gifts from K. Basler®®. Mutant strains for esc® and esc® escl*°'>'* were used as 
described previously’’, and detailed crossing schemes to test transgene function 
are given in Supplementary Figs 9 and 10. ChIP with larval tissues was done using 
flies homozygous for esc>Myc-ESC in homozygous esc® escl“°'5'* background 
or flies homozygous for esc>Myc-ESC(Phe77Ala) in homozygous esc® esc epi 
(selected using a CyO, GFP balancer). 

Transposon construction. ESC mutant transgenes were produced using the 
esc>Myc-ESC construct’? as starting material. The esc>Myc—ESC(Phe77Ala) 
and esc>Myc-ESC(Tyr338Ala) constructs for conventional P-mediated germ- 
line transformation were assembled in the pCaS-escp construct’. To generate 
transgenic lines for esc>Myc-ESC or esc>Myc—ESC(Phe345Ala) at the same 
chromosomal locations, we used the @C31 recombinase-mediated cassette 
exchange technique (RMCE)*’. For recombinase-mediated cassette exchange, 
pCaSpeRattB plasmid was first generated by excising with BamHI the UAS 
and hsp70 minimal promoter from pUASTattB (a gift from K. Basler) and 
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replacing them with a PCR-amplified multicloning site with BglII cohesive ends. 
A fragment containing the esc promoter was excised from the esc>Myc—ESC 
construct by NotI and KpnI and inserted in the NotI—KpnI site of pCaSpeRattB 
to generate pCaB-escp. The esc>Myc-ESC and esc>Myc—ESC(Phe345Ala) con- 
structs for RMCE were constructed by inserting the wild-type esc cDNA or esc 
cDNA with the relevant mutation in the KpnI site of the pCaB-escp. 

Western blotting and immunoprecipitation. Drosophila larval extracts were 
prepared as previously described". In brief, approximately 30 third instar larvae, 
frozen in liquid nitrogen, were pulverized in protein lysis buffer I (50 mM Tris, 
pH6.8, 100mM DTT, 2% SDS, 5mM EDTA, 1 mM phenylmethy! sulphonyl 
fluoride (PMSF) and 10% glycerol) using a micropestle. After heating at 95 °C 
for 5 min, and centrifuging for 10 min, supernatants were used for western blot 
analysis. Ovary extracts were made by homogenizing 60 sets of ovaries in extrac- 
tion buffer (30 mM HEPES-potassium hydroxide, pH 7.6, 150mM potassium 
acetate, 2mM magnesium acetate, 5 mM DTT, 0.1% NP40 and Protein Inhibitor 
cocktail (Roche)) using a micropestle, and the extracts were cleared by centrifu- 
gation. Immunoprecipitation was performed using anti-Myc and protein G 
sepharose beads (GE Healthcare). The beads were washed five times for 5 min 
in extraction buffer, boiled in sample buffer and analysed by western blotting. 
Rabbit anti-H3 and rabbit anti- H3K27me2 or -H3K27me3 were used with anti- 
rabbit IgG-alkaline-phosphatase as a secondary antibody. The rabbit anti-E(Z) 
was used with goat anti-rabbit IgG—horseradish peroxidase (HRP) and mouse 
anti-Myc was used with goat anti-mouse IgG—HRP light-chain-specific. 

ChIP with Drosophila larvae. ChIP was performed essentially as previously 
described with slight modifications**. Approximately 300 mg of larvae were 
taken for two independent experiments. The frozen larvae were first pulverized 
using a mortar and pestle in liquid N,, then homogenized with 10 strokes of a 
Dounce homogenizer in cross-linking solution (1.8% formaldehyde, 50 mM 
HEPES, pH 8.0, 1 mM EDTA, 0.5 mM EGTA, 100 mM NaCl). The homogenates 
were incubated by rotating at room temperature for 15 min. The fixation was 
stopped by washing for 5 min in 0.01% Triton X-100, 0.125 M glycine in PBS 
three times with mixing. After centrifugation at 1,500g for 3 min at 4°C, the 
pellets were washed for 10 min in 1 ml of wash buffer A (10 mM HEPES, pH 7.6, 
10mM EDTA, 0.5mM EGTA and 0.25% Triton X-100) and subsequently for 
10 min in 1 ml wash buffer B (10 mM HEPES, pH 7.6, 200mM NaCl, 1mM 
EDTA, 0.5mM EGTA and 0.01% Triton X-100) by mixing gently. The washed 
pellets were resuspended in sonication buffer (10 mM HEPES, pH 7.6, 1mM 
EDTA and 0.5mM EGTA). Sonication was performed with a Bioruptor as 
described previously*'. After sonication, samples were supplemented with 
N-lauroylsarcosine (0.5% final) and incubated for 10 min at 4°C with gentle 
mixing. Soluble chromatin was fractionated by centrifugation at top speed for 
10 min and transferred to new eppendorf tubes. The chromatin was aliquoted, 
quick-frozen in liquid N2 and stored at —80 °C. All steps for immunoprecipita- 
tion were performed at 4°C. An aliquot of sonicated chromatin was first pre- 
cleared by mixing with Protein G Sepharose beads (GE Healthcare) or Protein A 
Sepharose beads (Sigma) for 1 h. After centrifugation, precleared chromatin was 
incubated with anti-Myc9E10 (Chemicon), anti-E(Z) or anti-H3K27me3 
(Abcam, ab6002) overnight. Protein G or protein A sepharose beads were added 
to allow binding to the antibody for 2h and then washed five times with RIPA 
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buffer, once with LiCl buffer (10 mM Tris-HCl, pH 8.0, 250 mM LiCl, 0.5% NP- 
40, 0.5% sodium deoxycholate and 1 mM EDTA) and twice with TE buffer. The 
beads were resuspended in 100 pil TE buffer, and treated with 0.1 mg ml_' RNase 
A at 37 °C for 30 min. After supplementing with SDS (0.5% final), the beads were 
treated with 0.5 mg ml! Proteinase K at 37°C overnight and subsequently at 
65°C for 6h. The immunoprecipitated DNA was recovered by phenol— 
chloroform extraction and ethanol precipitation, and then dissolved in H,O. 
Control mock immunoprecipitations were done in the same way except that no 
antibodies were added to the reaction mixture. Real-time PCR quantification of 
immunoprecipitated DNA was performed as previously described*!. The input 
DNA extracted from the same sonicated chromatin aliquots as above was used to 
plot a standard curve. Primers were as follow: for bxd PRE (FM4 and FM6), 
FM4.1, 5’-AGCAATTTGTCACCGCAAGG-3’, FM4.2, 5’-GGATTTTGAGTG 
CGTTCTTCC-3’, FM6.1, 5’-CCAACGGAAAAGCGAGTGG-3’, and FM6.2, 
5’-GCACTAAACCCCATAAAAGTC-3’; for PBX enhancer, PBX-enh-5’, 5’-GA 
AAACACACAAGTGCAG-3’ and PBX-enh-3', 5'-GGAGATCCTAAAACAT 
GC-3’; for Ubx promoter, U-up1.1, 5'-ATTCGCGAGATACCAATGCC-3’ 
and U-up1.2, 5'’-ATTCGCGAGATACCAATGCC-3’; for white locus, W2.1, 
5'-ATGCCACGACATCTGACC-3’ and w2.3, 5’-AATGCCAGACGCTTCCTT 
TC-3'. The quantity obtained by real-time PCR was corrected to obtain the 
percentage of input. 
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Quantum signatures of chaos in a kicked top 


S. Chaudhury’, A. Smith’, B. E. Anderson’, S. Ghose” & P. S. Jessen’ 


Chaotic behaviour is ubiquitous and plays an important part in 
most fields of science. In classical physics, chaos is characterized 
by hypersensitivity of the time evolution of a system to initial con- 
ditions. Quantum mechanics does not permit a similar definition 
owing in part to the uncertainty principle, and in part to the 
Schrédinger equation, which preserves the overlap between 
quantum states. This fundamental disconnect poses a challenge to 
quantum-—classical correspondence’, and has motivated a long- 
standing search for quantum signatures of classical chaos”*. Here 
we present the experimental realization of a common paradigm for 
quantum chaos—the quantum kicked top”*— and the observation 
directly in quantum phase space of dynamics that have a chaotic 
classical counterpart. Our system is based on the combined elec- 
tronic and nuclear spin of a single atom and is therefore deep in the 
quantum regime; nevertheless, we find good correspondence 
between the quantum dynamics and classical phase space struc- 
tures. Because chaos is inherently a dynamical phenomenon, special 
significance attaches to dynamical signatures such as sensitivity to 
perturbation’” or the generation of entropy® and entanglement”*, 
for which only indirect evidence has been available”""'. We observe 
clear differences in the sensitivity to perturbation in chaotic versus 
regular, non-chaotic regimes, and present experimental evidence 
for dynamical entanglement as a signature of chaos. 

In classical mechanics, the state ofa physical system is specified by a set 
of dynamical variables—for example, the position and momentum of a 
point particle—whose values define a point in phase space. Regular 
motion is associated with periodic orbits in phase space, whereas chaos 
is characterized by complex, aperiodic trajectories that diverge exponen- 
tially as a function of initial separation. This description of states and 
time evolution is fundamentally incompatible with quantum mechanics, 
where conjugate observables such as position and momentum cannot 
take on well-defined values at the same time. However, it is still possible 
to represent a quantum state in phase space, in the form ofa delocalized 
quasi-probability distribution whose evolution is governed by the 
Schrédinger equation’*. This suggests an experiment in which one 
prepares an initial minimum uncertainty state centred on a point in 
phase space, subjects it to a desired evolution, measures the quantum 
state at successive points in time, and observes the degree to which the 
dynamically evolving quantum phase space distribution reflects the 
classical phase space structures. Experiments of this type can be simu- 
lated with classical waves’, but are very challenging for true quantum 
systems because of the overhead involved in state preparation, control 
and reconstruction. Quantum versions that accomplish several of the 
steps have been performed with cold atoms in laser standing waves'*", 
and in this Letter we complete the entire programme by including full 
quantum state reconstruction and visualizing the dynamics via complete 
phase space distributions. Placing the emphasis on dynamics comple- 
ments the much larger body of experimental work on energy level 
statistics in a broad range of physical systems'*"’. 

The experimental tools required to study quantum chaos directly 
in phase space have recently become available for the physical system 


consisting of the spin angular momentum of a single '**Cs atom in 
the F= 3 hyperfine ground state'**°. To take advantage of this, we 
have implemented a very popular model system known as the ‘kicked 
top’, consisting of a spin F whose dynamics is governed by a periodic 
Hamiltonian: 


_h Oey Kop 
H=hpk,) |,_f(t—-m) +ha FE (1) 


In the simplest case, the kick fis a 6-function, and each period of the 
classical evolution breaks down into a rotation about the y axis by a 
fixed angle p, followed by a twist (a rotation about the x axis by an 
angle proportional to F,). The parameter « determines the degree to 
which the dynamics are regular or chaotic. In our experiment, the 
rotation is performed by applying a short magnetic field pulse, 
whereas the twist is induced by the a.c. Stark shift (light shift) from 
a laser field tuned near the D1 resonance at 895 nm (ref. 18). Because 
the magnitude of the spin is conserved, phase space is a spherical 
surface on which each point represents a particular orientation of 
the spin, and the classical evolution can be visualized by a stroboscopic 
plot showing the state at times t= nt. Figure 1 shows such a phase 
space plot for our kicked top, with parameters p = 0.99 and x = 2.0. 
We see immediately that the phase space is mixed, with one large 
island of regular motion in the F, < 0 hemisphere, two smaller islands 
in the F,>0 hemisphere, and a sea of chaos almost everywhere else. 

To visualize a quantum state of the kicked top in phase space, one 
can expand it in a basis of spin-coherent states |0,@), which are mini- 
mum uncertainty states with maximum projection in the directions 


FIF FF 


Figure 1| Stroboscopic phase space plot for a classical kicked top. 
Trajectories are obtained by integrating the classical equations of motion 
and plotting the position of the spin after each kick. Depending on the 
starting point, states follow regular orbits (red), or move along chaotic 
trajectories (blue). Motion across the boundaries between regular and 
chaotic regions is classically forbidden. For this plot p = 0.99 and x = 2.0, 
resulting in a mixed phase space that contains both regular islands of various 
sizes and a sea of chaos. The F, < 0 and F, > 0 hemispheres are shown 
separately, respectively left and right. 
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given by the polar and azimuthal angles (0,¢) and thus the closest 
quantum approximation to a classical spin. This produces the Husimi 
quasi-probability distribution, Q(0,¢) = (2F + 1){0,||0,0)/41, where 
p is the density operator for the quantum state (pure or statistical 
mixture)*’. Q(0,@) is a normalized, everywhere positive function that 
comes as close as possible to a classical probability distribution in phase 
space. 

We use as a starting point for our kicked-top experiments an 
ensemble of laser-cooled Cs atoms prepared by optical pumping in 
a desired spin-coherent state py ~ |0,¢)(0,6|. In a given run of the 
experiment, each member of the ensemble is subjected to 1 periods of 
the kicked-top Hamiltonian, and the entire density operator for the 
final state is experimentally reconstructed'’. The process is repeated 
for 0 = n = 40, in order to build up a stroboscopic record {p,,} for the 
evolving quantum state. Finally, we carry out the entire procedure for 
a series of initial states. To obtain a visual quantum-—classical com- 
parison, we convert each data set {p,,} into Husimi distributions to 
obtain a ‘stop-motion movie’ of the evolution of the state. Figure 2A 
shows selected frames from a movie obtained for an initial spin- 
coherent state centred on the stable island near F,/F=1 in the 
F, > 0 hemisphere. Successive frames clearly show the phenomenon 
of dynamical tunnelling, wherein the quantum system oscillates 
between two regions of phase space even though motion through 
the chaotic sea is classically forbidden'*"*. The observed tunnelling 
period is in good agreement with a prediction based on decomposi- 
tion of the initial state into Floquet eigenstates (Supplementary 


Experimental 
data 


Figure 2 | Quantum phase space (Husimi) distributions for a quantum 
kicked top. Stroboscopic illustration of dynamical evolution, showing 
selected experimental snapshots from the first 40 periods of the kicked-top 
Hamiltonian. The period number is indicated in each frame. A, The initial 
state is a spin-coherent state centred at F,/F = 0.70, F,/F = 0.70, F, / 

F = —0.16, where it is mostly contained within the boundaries of the lower 
of the pair of islands in the F, > 0 hemisphere of Fig. 1. The state undergoes 
roughly 1.5 periods of dynamical tunnelling before coherence is lost. The 
state is almost entirely confined to the F, > 0 hemisphere, which is the only 
one shown. B, The initial state is a spin-coherent state centred at 

F,/F = —0.94, F,/F = 0.31, F,/F = —0.16, where it is mostly contained 
within the chaotic sea. The state spreads into the chaotic regions but 
generally avoids the regular islands. Both hemispheres are shown. 
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Information). It is also clear that the tunnelling oscillation dephases 
after roughly one period. This is a sign of imperfections in our experi- 
ments, mainly a 5% variation in x due to laser intensity variation 
across the ensemble, decoherence induced by spontaneous light 
scattering (~1 photon per 15 kicks), and background magnetic 
fields. Our data are in good quantitative agreement with a full master 
equation calculation that includes these imperfections. 

An additional, useful visualization of data of the type displayed in 
Fig. 2 can be achieved by averaging the Husimi distribution over many 
cycles. The result is a single plot showing the parts of phase space 
accessible from a given initial state. Figure 2C shows 40-period averages 
for three initial states, which together illustrate the remarkable degree 
to which our quantum kicked top reflects the boundaries between 
regular and chaotic regions in classical phase space. Although this is 
to be expected for systems in the mesoscopic regime, it is somewhat 
surprising that our deeply quantum mechanical spin should do so. 

In recent years, much attention has been directed towards dynami- 
cal signatures of chaos in quantum systems. One candidate is the 
sensitivity to perturbation, which can be quantified by the decay in 
overlap between quantum states evolving according to two slightly 
different Hamiltonians, and which can potentially reflect the classical 
Lyapunov exponent*’. The spin in our experiment is too small for the 
overlap to undergo exponential decay, but different sensitivities to 
perturbation should still be reflected in the decay of the purity of the 
spin density operator, as each spin evolves with a slightly different 
value of x, and is coupled to the environment through light scattering. 


C, 40-period averages of evolving Husimi distributions. a—c, The initial states 
are spin-coherent states centred at a, F,/F = 0.70, F,/F = 0.70, F,/F = —0.16 
(island in the F, > 0 hemisphere, same as A), at b, F,/F = —0.94, F,/F = 0.31, 
F,/F = —0.16 (in the chaotic sea, same as B), and at c, F,/F = 0, 

F,/F = —0.99, F,/F = —0.16 (large island in the F, <0 hemisphere). The 
upper data set is the observation from experiments, while the lower data set 
is the prediction of a full theoretical model taking into account decoherence 
and x variation. In combination, the quantum phase space distributions 
reflect the classical phase space structures of Fig. 1 with remarkable fidelity. 
To enhance contrast, the Husimi distribution in each image has been 
rescaled to fit the interval [0,1]. Each quantum state is experimentally 
reconstructed with a fidelity of ~90%. 
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Figure 3 shows the experimentally measured state purity, Tr[p7], as a 
function of period number, for two different initial states. As 
predicted, the purity decays at very different rates in regular and 
chaotic regions. 

For systems with multiple degrees of freedom, it has been argued 
that classical chaos is linked to the dynamical generation of entan- 
glement in the quantum system****. Our atomic spin is the sum of 
electron and nuclear spins, F = § + I. It is therefore natural to test if 
the entanglement generated between the two is a reliable signature of 
chaos. Here we use the linear entropy Spz = 1—Tr [pz] as our entan- 
glement measure, where p, is the reduced density operator for the 
electron spin. This is reasonable as long as the overall state is nearly 
pure. In our experiment, Sz reaches steady state after just a few kicks 
(Supplementary Information), and the 40-period average therefore 
serves as a convenient and robust measure of the entanglement 
generated by the dynamics. Figure 4 shows a significant dip in 
(Siz) and correspondingly less entanglement generation for initial 
states localized in regular regions compared to those in the chaotic 
sea. This is (to our knowledge) the first experimental evidence that 
the purely quantum property of entanglement is a good signature of 
classical chaos. Note that whereas the signature is very clear for initial 
states in the large regular island in the F,<0 hemisphere, it is less 
apparent for states located on the small island in the F,>0 hemi- 
sphere. This loss in contrast occurs because the latter become entang- 
led by dynamical tunnelling, and is therefore linked to the deeply 
quantum nature of our small spin. Contrast is further reduced by the 
sensitivity of tunnelling to « variations and decoherence, which is 
apparent from the difference between our perturbation-free and full 
models (Supplementary Information). Tunnelling will be suppressed 
for much larger spins, and it is reasonable to assume that the distinc- 
tion between regular and chaotic regions will be more universal in 
that regime. 

Our laboratory realization of the kicked top with atomic spins points 
the way to further studies of quantum chaos in the time domain. We are 
currently working to extend our control and measurement tools to the 
entire hyperfine ground manifold of the Cs atom”, which will provide 
access to the full state space for the coupled electron—nuclear spins. 
Besides increasing the size of state space by more than a factor of two, 
this will offer a more powerful platform for the study of entanglement 
as a quantum signature of chaos”’. To reach the semiclassical limit of 
very large spins, one can in principle implement a kicked-top 
Hamiltonian for the collective spin of an atomic ensemble, with the 
twisting interaction induced either by ultracold collisions”’, or by coup- 
ling the spins to a shared mode of a quantized electromagnetic field”*. 
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Figure 3 | Sensitivity to perturbation as a quantum signature of chaos. The 
purity of the spin density operator, Tr[p”], shown as a function of the period 
number. a, Initial state localized on the large island in the F,, < 0 hemisphere 
(same data set as in Fig. 2C, c). b, Initial state localized in the sea of chaos 
(same data set as in Fig. 2C, b). Red circles are experimental data and the 
green lines are the predictions of a full model. As expected, perturbations, in 
the form of decoherence and x variation across the ensemble, reduce the 
purity much faster for a state in the chaotic sea. Experimental error bars, 
+1s.d. 
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Figure 4 | Entanglement as a quantum signature of chaos. Entanglement 
between the electron and nuclear spins is quantified by the linear entropy, 
Siz =1—Tr[p?], of the electron reduced density operator. It is averaged over 
40 periods of the kicked-top Hamiltonian, and shown as a function of the 
centre of the initial spin coherent state |0,0). a, Theoretical prediction for 
Schrodinger evolution, corresponding to an ideal situation without 
perturbations (no decoherence or x variation). Colours indicate the value of 
(Siz). b, Experimental measurements performed for states lying along the 
green cross-section in a. Also shown are the predictions of a full model (solid 
green line) and the perturbation free model (dashed blue line) used in a. The 
black dashed line is the linear entropy of a minimally entangled pure state in 
the F = 3 manifold. A marked contrast in dynamically generated 
entanglement can be seen between regular and chaotic regions. 
Experimental error bars, +1s.d. 


Ultimately, this could allow experiments to address some of the most 
important outstanding questions related to quantum—classical corres- 
pondence, such as how to recover classical (chaotic) dynamics in open 
quantum systems subject to decoherence”* or weak measurement”. 


METHODS SUMMARY 


We prepare a spin ensemble by capturing and laser cooling ~10’ Cs atoms in a 
magneto-optical trap and optical molasses. The atoms are released into free fall, 
and optically pumped into an F = 3 spin-coherent state with respect to a fixed 
axis. A set of precision coils driven by arbitrary waveform generators apply time- 
dependent magnetic fields in a bandwidth of ~200 kHz, and generate fast and 
accurate rotations through the magnetic interaction gplipB*F, where gp is the 
Lande g factor for the spin F and up is the Bohr magneton. We use magnetic field 
pulses to prepare spin-coherent states along desired directions, and to perform 
the rotation in the kicked-top Hamiltonian. The continuous twist is induced by 
the a.c. Stark shift in a linearly polarized, monochromatic laser field, leading to 
an effective ground state Hamiltonian of the form here (ref. 18). In our experi- 
ment the magnetic kick duration is 17 fis, the peak Larmor frequency is 15 kHz, 
the twisting strength € = 2m X 533 Hz, and the kicked-top period is t = 100 pts 
for k = 2.0. The finite duration of the magnetic kick causes overlap of the rota- 
tion and twisting parts of the evolution, but this does not significantly alter the 
character of the dynamics and is easily taken into account in the equations of 
motion. The rotation, p = 0.99, is chosen to maximize the size of the islands in 
the F, > 0 hemisphere, and to allow a spin coherent state to be contained mostly 
within one of these. 

Following n periods of the kicked-top Hamiltonian, the information needed 
to reconstruct the final spin density operator with a fidelity of ~90% is acquired 
during a single 2-ms phase of continuous weak optical measurement and 
dynamical control. Details regarding nonlinear spin dynamics, quantum control 
and quantum state reconstruction, and the theoretical modelling of spin 
dynamics, can be found in previous work published by our group'*!?°. 
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Observation of unidirectional backscattering-immune 
topological electromagnetic states 


Zheng Wang'*, Yidong Chong't*, J. D. Joannopoulos’ & Marin Soljaéic’ 


One of the most striking phenomena in condensed-matter physics 
is the quantum Hall effect, which arises in two-dimensional 
electron systems'* subject to a large magnetic field applied 
perpendicular to the plane in which the electrons reside. In such 
circumstances, current is carried by electrons along the edges of 
the system, in so-called chiral edge states (CESs). These are states 
that, as a consequence of nontrivial topological properties of the 
bulk electronic band structure, have a unique directionality and 
are robust against scattering from disorder. Recently, it was 
theoretically predicted*”’ that electromagnetic analogues of such 
electronic edge states could be observed in photonic crystals, 
which are materials having refractive-index variations with a 
periodicity comparable to the wavelength of the light passing 
through them. Here we report the experimental realization and 
observation of such electromagnetic CESs in a magneto-optical 
photonic crystal’ fabricated in the microwave regime. We 
demonstrate that, like their electronic counterparts*’, electro- 
magnetic CESs can travel in only one direction and are very robust 
against scattering from disorder; we find that even large metallic 
scatterers placed in the path of the propagating edge modes do not 
induce reflections. These modes may enable the production of 
new classes of electromagnetic device and experiments that would 
be impossible using conventional reciprocal photonic states alone. 
Furthermore, our experimental demonstration and study of photo- 
nic CESs provides strong support for the generalization and applica- 
tion of topological band theories to classical and bosonic systems, 
and may lead to the realization and observation of topological phe- 
nomena in a generally much more controlled and customizable 
fashion than is typically possible with electronic systems. 

The existence of photonic CESs was first predicted*® by an analogy 
between a photonic crystal'*”° with broken time-reversal symmetry 
and a system exhibiting the quantum Hall effect (QHE). In this ana- 
logy, the electromagnetic fields play the part of the electronic current, 
the variations of permittivity and permeability within the photonic 
crystal play the part of the periodic potential and the gradients of the 
gyrotropic components of the permeability tensor play the part of the 
external d.c. magnetic field that breaks the time-reversal symmetry~’. 
The defining feature of a photonic CES is that its group velocity points 
in only one direction, which is determined by the sign of the field that 
breaks the time-reversal symmetry and the resulting unusual topo- 
logical properties of the bulk band structure. To detect the possible 
presence of non-trivial topological band properties in a photonic- 
crystal system it is sufficient’ to compute its Chern numbers. 
(Although the original proposal*® focused on “Dirac points’, it is 
not necessary to restrict to such band structures; thus, the use of a 
variety of photonic-crystal systems is possible’.) The Chern number of 
band n of a two-dimensional (2D) periodic photonic crystal is an 
integer defined by® 


oat 2 oA;" _ oan" 
Cn= 2ni J dk (= Oky 
BZ 


where the k-space integral is performed over the first Brillouin zone 
and the Berry connection’ is given by 


A" (k) = i(Entk|Vie|En’k) = i| dv re(r)E;.(r) r [ViEwk(1)| 


where E,,, is the periodic part of the electric-field Bloch function'®, an 
asterisk denotes complex conjugation and «(r) denotes the dielectric 
function. Because the Chern number characterizes the winding number 
of the phase of the Bloch functions around the boundary of the first 
Brillouin zone", it is a ‘global’ or ‘topological’ property of the entire 
band and is very robust against structural perturbations’®. Notably, it 
can be non-zero only if the system lacks time-reversal symmetry’. 

One of the most interesting properties of QHE systems is that the 
Chern numbers have a direct physical significance: a finite crystal that 
supports bulk bands with non-zero Chern numbers also supports 
unidirectional CESs at its boundary at energies within bulk band 
gaps opened by the applied d.c. magnetic field. Moreover, the 
number of CESs turns out to be equal to the sum of the Chern 
numbers of all the bulk bands of lower energy’*. Although this result 
has been formally proven only in a tight-binding QHE system, it is 
believed to be independent of the details of the underlying model, 
such as the structure of the lattice and the edge. Its validity in 
photonic-crystal systems was originally predicted in refs5,6, and 
corroborated through a formal mapping’ to a ‘zero-field QHE’ 
system” and ab initio numerical simulations of Maxwell’s equations’. 
It is important to emphasize that although CESs have so far been 
experimentally observed only in electronic (that is, fermionic) 
systems, the phenomenon should actually be independent of the 
underlying particle statistics because the Chern number is defined 
in terms of single-particle Bloch functions. An experimental verifica- 
tion would therefore provide strong support for the generalization of 
topological band theories and their applications to classical and 
bosonic systems. 

The ability to work with photonic-crystal band structures without 
Dirac points has allowed us to design an experimentally viable photonic- 
crystal system’ for the observation of CESs. Our experimental system 
(Fig. 1) involves a gyromagnetic, 2D-periodic photonic crystal consist- 
ing of a square lattice of ferrite rods in air (details of the structure and 
materials used can be found in Methods), bounded on one side by a 
non-magnetic metallic cladding. The interface between the photonic 
crystal and the cladding acts as a confining edge or waveguide for 
CESs. (Without this cladding, the CESs at the air edges of the photonic 
crystal would simply radiate away.) Neglecting absorption losses and 
nonlinear effects, we would expect power transmission of a CES along 
this waveguide to be independent of the waveguide geometry and also 
immune to backscattering from disorder, obstacles and defects. 
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Figure 1| Microwave waveguide supporting CESs. a, Schematic of the 
waveguide composed of an interface between a gyromagnetic photonic- 
crystal slab (blue rods) and a metal wall (yellow). The structure is 
sandwiched between two parallel copper plates (yellow) for confinement in 
the z direction and surrounded with microwave-absorbing foams (grey 
regions). Two dipole antennas, A and B, serve as feeds and/or probes. A 
variable-length (J) metal obstacle (orange) with a height equal to that of the 
waveguide (7.0 mm) is inserted between the antennas to study scattering. A 
0.20-T d.c. magnetic field is applied along the z direction using an 
electromagnet (not shown). b, Top view (photograph) of the actual 
waveguide with the top plate removed. 


Before we discuss the results of our measurements, we will first 
describe how we arrived at this particular choice of experimental 
system. We chose rods in air for the basic photonic-crystal geometry 
because of ease of fabrication. We then performed a series of numerical 
simulations for a variety of rod sizes and lattice constants on a model 
2D photonic-crystal system to optimize the band structure and 
compute corresponding band Chern numbers using material 
parameters appropriate to a low-loss ferrite (Methods). Our numerical 
simulations predicted that when the ferrite rods in this photonic 
crystal are magnetized to manifest gyrotropic permeability (which 
breaks time-reversal symmetry), a gap opens between the second 
and third transverse magnetic (TM) bands. Moreover, the second, 
third and fourth bands of this photonic crystal acquire Chern numbers 
of 1, —2 and 1, respectively. This result follows from the C,, symmetry 
of a non-magnetized crystal'’. The results of our simulations for the 
photonic crystal with metallic cladding are presented in Fig. 2. (Similar 
numerical results were obtained in ref.7, albeit using a different 
material system and geometry.) Here we show the calculated field 
patterns of a photonic CES residing in the second TM band gap 
(between the second and the third bands). Because the sum of the 
Chern numbers over the first and second bands is 1, exactly one CES 
is predicted to exist at the interface between the photonic crystal and 
the metal cladding. The simulations clearly predict that this photonic 
CES is unidirectional. As side-scattering is prohibited by the bulk 
photonic band gaps in the photonic crystal and in the metallic 
cladding, the existence of the CES forces the feed dipole antennas 
(which would radiate omnidirectionally in a homogeneous medium) 
to radiate only towards the right (Fig. 2a,c). Moreover, the lack of 
any backwards-propagating mode eliminates the possibility of 
backscattering, meaning that the fields can continuously navigate 
around obstacles, as shown in Fig. 2b. Hence, the scattering from the 
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Figure 2 | Photonic CESs and effects of a large scatterer. a, CES field 
distribution (E,) at 4.5 GHz in the absence of the scatterer, calculated from 
finite-element steady-state analysis (COMSOL Multiphysics). The feed 
antenna (star), which is omnidirectional in homogeneous media 
(Supplementary Information), radiates only to the right along the CES 
waveguide. The black arrow represents the direction of the power flow. 

b, When a large obstacle (three lattice constants long) is inserted, forward 
transmission remains unchanged because backscattering and side-scattering 
are entirely suppressed. The calculated field pattern (colour scale) illustrates 
how the CES wraps around the scatterer. c, When antenna B is used as feed 
antenna, negligible power is transmitted to the left, as the backwards- 
propagating modes are evanescent. a, lattice constant. 


obstacle results only in a change of the phase (compare Fig. 2a and 
Fig. 2b) of the transmitted radiation, with no reduction in amplitude. 

For CESs to be readily measurable in the laboratory (where it is 
necessary to use a photonic crystal of finite and manageable size) they 
must be spatially well localized, and this requires the photonic band 
gaps containing the states to be large. The sizes of the band gaps that 
contain CESs (and the frequencies at which they occur) are determined 
by the gyromagnetic constants of the ferrite rods constituting the 
photonic crystal. Under a d.c. magnetic field, microwave ferrites 
exhibit a ferromagnetic resonance at a frequency determined by the 
strength of the applied field’*. Near this frequency, the Voigt 
parameter, V= |{.|/|Lx| (where fu,, and j1,, are diagonal and off- 
diagonal elements of the permeability tensor, respectively), which is 
a direct measure of the strength of the gyromagnetic effect, is of order 
one. Such ferromagnetic resonances are among the strongest low-loss 
gyrotropic effects at room temperature and subtesla magnetic fields. 
Using ferrite rods composed of vanadium-doped calcium-iron— 
garnet under a biasing magnetic field of 0.20T (Methods and 
Supplementary Information), we achieved a relative bandwidth of 
6% for the second TM band gap (around 4.5 GHz in Fig. 3b). As 
discussed earlier, this is the gap predicted to support a CES at the 
interface of the photonic crystal with the metallic wall. We emphasize 
again that band gaps with trivial topological properties (that is, for 
which the Chern numbers of the bulk bands of lower frequencies sum 
to zero), such as the first TM band gap (around 3 GHz in Fig. 3b), do 
not support CESs. All of the insight gained from the model 2D photo- 
nic-crystal system was then incorporated into the final design (Fig. 1). 
To emulate the states of the 2D photonic crystal, the final design 
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Figure 3 | CES-facilitated waveguiding in a photonic crystal. a, Forward 
and backward transmission spectra measured using only the bulk photonic 
crystal in the set-up shown in Fig. 1 (that is, without the metal cladding and 
obstacle), with the antennas placed in the interior of the photonic crystal, in 
a 0.20-T d.c. magnetic field. The bulk transmission is reciprocal, with 
photonic band gaps at 3.3 and 4.5 GHz. b, Calculated projected photonic- 
crystal band structure (blue and grey areas). Included is the CES (red curve) 
that exists at the interface between the metal cladding and the photonic 


involved fabrication ofa three-dimensional (3D) photonic-crystal slab 
structure equivalent to the model 2D photonic-crystal system, made 
from gyromagnetic rods with parallel metallic plates on the top and 
bottom, spaced to support only transverse electromagnetic modes 
(identical to the TM modes in the 2D photonic crystal; see 
Methods). A copper wall was then added at the edge of the photo- 
nic-crystal slab to provide the required cladding. 

In our experiments, the band gaps and the CES waveguide were 
characterized using two-port vector network analysis using a pair of 
dipole antennas, labelled A and B in Fig. la (Methods). First, to 
characterize the band gap, we inserted antennas A andB into the 
interior of the photonic crystal far from the edges and eight lattice 
constants apart. We observed the second band gap with a 50-dB 
extinction for both forward and backward transmission (with 
respective transmission coefficients Sg, and Sap; Fig. 3a). The 
frequency ranges of both the first and the second band gaps agree 
well with our predicted band structure calculations (no adjustable 
parameters; Fig. 3b). Next, to characterize the CESs, we measured the 
transmission spectra with the apparatus as illustrated in Fig. la 
(Methods). At frequencies within the second band gap, we observed 
a strong forward transmission, approximately 50 dB greater than the 
backward transmission at mid-gap frequencies (Fig. 3c). Over much 
of this frequency range, the backward transmission was below the 
noise floor of the network analyser, which suggests an even greater 
actual contrast. This difference of more than five orders of magnitude 
in power transmission, over a distance of only eight lattice constants, 
confirms that backwards-propagating modes are highly evanescent, 
as predicted. 

We tested the robustness of the unidirectional propagation by 
studying the effect of a large obstacle on transmission. We gradually 
inserted a conducting barrier across the waveguide, blocking the direct 
path between antennas A and B. The measured transmission behaviour 
at different stages of the insertion (Fig. 4) remains basically the same as 
that in Fig. 3c: the transmission between 4.35 and 4.62 GHz remains 
strongly non-reciprocal, with a 40-50-dB difference between the 
forward and backward transmissions. This finding agrees with the 
theoretical prediction that power transmission by means of CESs is 
fundamentally insensitive to scattering from arbitrarily large defects 
(Fig. 2b). This behaviour is a distinguishing feature of the present 
waveguide. In a conventional waveguide, insertion of such a large 
obstacle would cause very large backscattering and significantly 
reduced transmission to the output. For example, in a photonic 
crystal constructed using ordinary dielectric rods and with identical 
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crystal. The grey areas are bulk bands with ill-defined band-edges due to 
large absorption near the ferromagnetic resonance. Each band’s Chern 
number is labelled. c, Measured transmission spectra upon inclusion of the 
metal cladding and antennas placed as shown in Fig. 1. In the resulting CES 
waveguide, there is very high contrast between the forward and backward 
transmissions for frequencies in the second band gap (yellow), around 

4.5 GHz. This striking unidirectionality indicates the existence of a CES. 


dimensions (Supplementary Information), a similar barrier length 
of 1.65 lattice constants reduces forward transmission by four 
orders of magnitude. This measurement further confirms that the 
backwards-propagating modes are purely evanescent, and not merely 
lossy. If lossy backwards-propagating modes existed in the system, a 
large defect would scatter a significant portion of energy into them, 
essentially converting backscattering into loss. The forward transmis- 
sion in the presence of the large defect would be much smaller than in 


Transmission (dB) 


Frequency (GHz) 


Figure 4 | CES transmission spectra in the presence of a large scatterer. 
The length of the obstacle, I, was gradually varied from 0.40a to 1.65a (lattice 
constant, a = 40 mm); this induced only minor differences in the forward 
transmission near the mid-gap frequency of 4.5 GHz. The lack of any 
significant changes in the forward transmission, and non-reciprocity 

( |Sas| < |Spa|) with large increases in the size of the scatterer, indicate that 
the CES can travel around the obstacle without scattering or reflections, as 
predicted by simulations. The experimental parameters remained 
unchanged from the measurement in Fig. 3c. 
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the defect-free case. Existing optical isolators, such as those relying on 
Faraday rotation or non-reciprocal phase shifts, absorb or radiate 
backwards-propagating light in this way. Thus, the unidirectional 
guiding of a CES is fundamentally different from how optical isolators 
operate. 

The experimental establishment of topological photonic states 
opens a wide range of future opportunities. First, our realization of 
nontrivial topological Chern numbers in a classical photonic system 
raises the possibility of using photonic systems to realize other classes 
of topological quantum numbers that are of interest in condensed- 
matter physics. Examples include the Z, topological number asso- 
ciated with the quantum spin Hall effect'?** and the ‘Hopf number’ 
in certain 3D insulators”. Photonic crystals are attractive for such 
investigations because parameters such as lattice constants and unit- 
cell geometries can be chosen ina fully controlled manner", unlike in 
most electronic systems. Second, the fact that the CESs in the present 
system are immune to scattering from disorder ensures that the 
design is tolerant of fabrication imperfections, such as variations in 
the lattice constant or the exact position of the guiding edge; this 
could enable implementation of extremely robust waveguides. 
Finally, photonic CESs might prove useful in applications involving 
isolators or slow light**’*. In conventional slow-light systems, 
disorder induces backscattering that increases quadratically with 
reduced group velocity”, making them very sensitive to disorder. 
Although the experiments described here were conducted at 
gigahertz frequencies, this operating frequency can be increased 
simply by applying a stronger d.c. magnetic field’*. Extension into 
the terahertz range might be achieved using metamaterials that 
resonantly enhance the magnetic activity**~°. Further extension to 
the optical regime is challenging, given the losses and weak gyrotropic 
effects in currently known materials. 


METHODS SUMMARY 


The gyromagnetic photonic crystal was constructed using a square array (lattice 
constant, a= 40mm) of vanadium-doped calcium-iron-garnet (VCIG; TCI 
ceramics NG-1850) rods. Balancing the need for a large Voigt parameter against 
the drawback of absorption loss in the vicinity of the ferromagnetic resonance 
(5.6 GHz), we designed the rod radius to be 3.9mm and a to be 40mm to 
maximize the bandwidth of the band gap without suffering excessive loss. A 
16 X 10 array was used to measure the band gap of a bulk crystal and a 16 X 7 
array was used to study the waveguide and the effect of scattering. The VCIG 
ferrite has a measured relative permittivity of ¢, = 14.63 and a loss tangent of 
tand= 0.00010. The saturation magnetization was measured to be 
M, = 1.52 X 10°Am__!, with a 3-dB linewidth of the ferromagnetic resonance 
at AH = 1.03 X 10° Am '. Using the cyclotron electromagnet at Massachusetts 
Institute of Technology, we applied a d.c. magnetic field of 0.20 T along the out- 
of-plane zdirection, with a spatial non-uniformity of less than 1.5%. The d.c. 
magnetic field breaks the time-reversal symmetry in the photonic crystal. The 
magnetic field strength was measured and calibrated using a LakeShore Model 
410 gaussmeter. 


Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 
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METHODS 

Parallel-plate waveguide for out-of-plane confinement. The unidirectional 
CES waveguide was designed to reproduce the dispersion relation and the modal 
profile of a topological edge mode of a 2D gyromagnetic photonic crystal, using a 
3D structure with a finite height. The out-of-plane confinement in the z direction 
was achieved using two parallel horizontal copper plates, separated by 7.0 mm. 
This structure is known as a parallel-plate waveguide in microwave engineer- 
ing'*. It supports TEM modes with electric fields pointing in the out-of-plane 
zdirection and magnetic fields parallel to the x-y plane. This polarization is 
identical to the TM modes in 2D photonic crystals where topological modes 
have been proposed to exist’. Between the two plates, the electromagnetic fields 
of TEM modes are also uniform along the z direction, as in a 2D system. This 3D 
structure therefore closely mimics a 2D system and is considered to be quasi-2D. 
When operated below 21 GHz, the waveguide supports only TEM modes. 
Single-mode microwave CES waveguide and absorbing boundaries. Similar to 
the case of conventional waveguides, if the edge waveguide has too large a cross- 
sectional area it could lead to multimode operation, causing both a uni- 
directional CES as well as conventional bidirectional modes to be present in 
the waveguide. To ensure that only a CES is present in the measurement set- 
up, we chose the distance between the photonic crystal and the conducting 
copper wall to be 25 mm, which is narrow enough to eliminate all bidirectional 
modes at the frequencies of the second band gap. With a 6% relative bandwidth 
for this band gap, a CES is confined within three lattice constants of the edge, 
even around a large scatterer. The copper scatterer had a height of 7.0 mm anda 
width of 7.2 mm, with its maximum length mainly limited by the finite size of the 
crystal used in this experiment. Microwave-absorbing foam pieces were placed 
along the other three edges of the photonic crystal, to prevent the CES from 
circulating all the way around the boundary of the crystal. In addition, these 
foam pieces shielded the system from external interference. 


nature 


Microwave transmission measurement for bulk crystals and for CESs. Two 
identically constructed antennas were inserted through the top copper plate, 
extending to contact the bottom copper plate. These antennas, labelled A and B 
in Fig. la, were connected by coaxial cables to the two ports of a Hewlett Packard 
8719C vector network analyser, which measures the transmission coefficients Sap 
and Spa. Two-port short—-open—load—through calibrations were performed at the 
coaxial adaptor. Therefore, measured S parameters contain a frequency-dependent 
insertion loss from the impedance mismatch between the antenna, the feed coaxial 
cable and the photonic-crystal waveguide, and from the transition between the 
balanced parallel plates and the unbalanced coax cable. This loss is reciprocal and 
does not affect the ratio of the transmission coefficients, |S,p/Sga|. Therefore, 
any substantial difference between |S,p| and |Sga| is an experimental signature 
of the unidirectionality of CESs. We extracted the forward and backward trans- 
mission spectra in a frequency sweep from 1 to 6 GHz. Each measurement was 
performed with an intermediate frequency of 20 Hz and four averages, with the 
power level normalized to the level at the band edges. To measure bulk band gaps 
(Fig. 3a), antennas A and B were located along the long axis of a 16 X 10 photonic 
crystal, eight lattice constants apart (Supplementary Information). For the CES 
waveguide (Figs 3c and 4), we performed the measurement with the feed and probe 
antennas located between the copper wall and the 16 X 7 photonic crystal, also 
eight lattice constants apart (Fig. la), and with the metal wall 9 mm away from each 
antenna. 

Effects of material absorption loss. Most of the propagation loss in the present 
system may be attributed to two sources: the radiation losses originating from the 
finite width of the photonic-crystal cladding and the intrinsic material absorption 
associated with the ferromagnetic resonance. The radiation loss could be further 
reduced simply by increasing the number of unit cells in the lateral direction, 
whereas the absorption loss could in principle be further reduced by using mono- 
crystalline yttrium—iron-garnet as the ferrite material'*. The resultant attenuation 
length would be on the order of hundreds of lattice constants. 
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Early Palaeogene temperature evolution of the 


southwest Pacific Ocean 


Peter K. Bijl', Stefan Schouten’, Appy Sluijs', Gert-Jan Reichart?, James C. Zachos* & Henk Brinkhuis’ 


Relative to the present day, meridional temperature gradients in the 
Early Eocene age (~56-53 Myr ago) were unusually low, with 
slightly warmer equatorial regions’ but with much warmer subtro- 
pical Arctic” and mid-latitude’ climates. By the end of the Eocene 
epoch (~34Myr ago), the first major Antarctic ice sheets had 
appeared*”, suggesting that major cooling had taken place. Yet the 
global transition into this icehouse climate remains poorly con- 
strained, as only a few temperature records are available portraying 
the Cenozoic climatic evolution of the high southern latitudes. Here 
we present a uniquely continuous and chronostratigraphically well- 
calibrated TEX. record of sea surface temperature (SST) from an 
ocean sediment core in the East Tasman Plateau (palaeolatitude 
~65° S). We show that southwest Pacific SSTs rose above present- 
day tropical values (to ~34°C) during the Early Eocene age 
(~53 Myr ago) and had gradually decreased to about 21 °C by the 
early Late Eocene age (~36 Myr ago). Our results imply that there 
was almost no latitudinal SST gradient between subequatorial and 
subpolar regions during the Early Eocene age (55-50 Myr ago). 
Thereafter, the latitudinal gradient markedly increased. In theory, 
if Eocene cooling was largely driven by a decrease in atmospheric 
greenhouse gas concentration’, additional processes are required to 
explain the relative stability of tropical SSTs given that there was 
more significant cooling at higher latitudes. 

The Palaeogene temperature evolution of the Antarctic margin, 
particularly the Pacific sector, is still poorly resolved. One difficulty 
with obtaining relevant records close to the Antarctic continent is the 
general absence of biogenic carbonate in most marine facies, which 
hampers traditional 5'°O and/or Mg/Ca-based reconstructions of the 
subpolar temperature evolution. In the absence of biogenic carbonates, 
organic sea-surface-temperature proxies such as the tetraether index of 
lipids consisting of 86 carbon atoms (TEXg6)’ and the alkenone un- 
saturation index (U*37)° are required for reconstructing high-latitude 
climatic evolution””. 

We apply TEXg¢ and U7 on a stratigraphically continuous sedi- 
mentary section from the southwest Pacific Ocean, drilled by the 
Ocean Drilling Program (ODP Leg 189 Site 1172, palaeolatitude 
~65° S (ref. 10); Fig. 1). A full methodological description is available 
in Supplementary Information. The record contains an expanded 
succession of marginal marine sediments from the lower Palaeocene 
epoch to the upper Eocene (64-36 Myr ago), with tight chronostrati- 
graphic control, including magnetostratigraphy"’ (Supplementary 
Fig. 2). The presence of typical trans-Antarctic organic-walled dino- 
flagellate cysts in the Tasman region indicates an Antarctic-derived 
northward-flowing Tasman Current throughout the Palaeogene, 
which is verified by experiments based on general circulation models” 
(Fig. 1). This Antarctic influence at the East Tasman Plateau (ETP) 
persisted until at least the early Late Eocene (~35.5 Myr ago), when 


deepening of the Tasmanian Gateway lead to a reorganization of the 
Tasman and proto-Leeuwin ocean currents’. 

According to the oldest part of the record, TEXg.-derived SSTs at the 
ETP gradually decreased from ~25 °C around 63 Myr ago to a min- 
imum of ~20°C around 58 Myr ago (Fig. 2a). During the Late 
Palaeocene and Early Eocene, Tasman SSTs gradually rose to tropical 
values of ~34 °C during the Early Eocene climatic optimum (EECO)’, 
between 53 and 49 Myr ago (Fig. 2a). A gradual cooling trend through- 
out the Middle Eocene (starting at the termination of the EECO 
~49 Myr ago) arrived at temperatures of ~23 °C ~42 Myr ago, which 
is still relatively warm. Subsequently, an interruption of the cooling 
trend occurred at the Middle Eocene climatic optimum (MECO; 
~40 Myr ago)", followed by a relatively rapid SST decrease to 
~21°C in the early Late Eocene (Fig. 2a). The late Middle and Late 
Eocene TEXg¢-based SSTs are supported by U* 3 SST estimates derived 
from the same samples (Supplementary Fig.3). Both SST estimates 
also compare well with those for other Late Eocene (Southern 
Ocean) sites’. Unfortunately, sediments from the ETP older than the 
MECO did not contain alkenones for U3, SST reconstructions. 

The Middle Eocene SSTs correspond closely to those from sections 
in New Zealand'>'®, according to records based on TEXg¢ (Fig. 2a), 
Mg/Ca and 5'*O, indicating regional consistency of our reconstructed 
SSTs. Also, trends in our Tasman SST record are remarkably similar to 
those in the global stack of benthic foraminiferal oxygen isotopes® 
(Fig. 2b), which we updated and augmented with recently published 


Site 1172 
va 


¢ 


Figure 1| Site location and surface currents. Palaeogeographic 
reconstruction for the South Pacific Ocean at Early-Middle Eocene times. 
Surface circulation’* indicates the Antarctic-derived Tasman Current (TC) 
over the East Tasman Plateau. Palaeogeographic charts obtained from the 
Ocean Drilling Stratigraphic Network (ODSN)); after ref. 26. The dashed red 
arrow around New Zealand indicates potential mixing of low-latitude 
surface waters (from the East Australian Current) with the TC. 
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Figure 2 | Palaeogene deep-sea and sea surface temperatures. a, TEX 36 
SST reconstructions from ODP Site 1172, New Zealand’*® and Tanzania‘ 
(all according to the same calibration; see Supplementary Information). The 
black wiggly lines are short (~100 kyr)”’ and longer hiatuses at Site 1172. 
b, Global stack of benthic foraminiferal oxygen isotopes (grey data; 


data (Supplementary Information). This correspondence between the 
two records (Supplementary Fig. 4) indicates that the regional SSTs 
co-varied with the SSTs where ‘global’ deep water was sourced. It has 
previously been suggested that the Southern Ocean was the main 
region of deep-water formation during the Palaeogene”’. 

In contrast, absolute SST estimates from the Tasman region are 
much higher than those inferred from the benthic foraminiferal oxygen 
isotopes (Fig. 2). Part of this discrepancy might be due to seasonality, 
with TEX, being slightly skewed towards summer temperatures and 
benthic foraminiferal 5'°O towards winter temperatures (Supplemen- 
tary Information). Another possibility is that deep-water formation 
occurred in areas that were cooler than the Tasman sector. SST recon- 
structions based on bivalve-shell oxygen isotopes from Seymour Island 
on the Antarctic shelf, for example, yield much lower SSTs'*. It is 
possible that the Antarctic margin was more susceptible to winter 
cooling than the open ocean, or that portions of the coast along the 
Antarctic sector were somehow isolated from the southern edges of the 
Southern Ocean gyres. Another possibility is that the aragonite bivalve 
shells integrate temperature over a greater portion of the year. 
Regardless, the large SST difference between the Weddell Sea and the 
ETP would suggest a relatively steep gradient within a few degrees of 
latitude. Antarctica, being a polar continent, would most likely 
have experienced extremes in temperature, in particular having cool 
winters. Such conditions might have been recorded in the bivalves 
from the Weddell Sea but not in the more distal ETP. In turn, deep- 
water formation might have been restricted to the Antarctic shelf areas, 
such as the Weddell Sea. 
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Supplementary Information). The temperature scale assumes ice-free 
conditions (5'’Osyow = —1.2%o, where 5 ?Ogmow = ('80/ ©O) caenpiel 
('80/'°O) sow — 1; SMOW, standard mean ocean water), and indicates 
deep-sea temperatures. The black solid line reflects a five-point running 
average. PETM, Palaeocene—Eocene thermal maximum. 


Planktonic foraminiferal 5'°O analyses from equatorial regions 
previously indicated that Palaeogene low-latitude SSTs were the same, 
or even lower, than those of today'’, a problem that puzzled palaeo- 
climate scientists for decades. The oxygen isotopic composition of 
planktonic foraminiferal tests in porous carbonate-rich pelagic facies 
were later found to be partially altered owing to recrystallization 
primarily during early diagenesis*®*!. In contrast, carbonate-poor 
and clay-rich facies typically found on the continental margins 
contain calcite shells without major diagenetic overprint’. For the 
Eocene, such well-preserved planktonic foraminifera indicate near- 
equatorial SSTs that were greater than those of the present day, and 
agree with TEXg,-derived SSTs'”!. 

Another observation from well-preserved foraminifera and TEXg¢ 
is that (sub)equatorial SSTs were remarkably stable throughout the 
Eocene’ (Fig. 2). Stable low-latitude SSTs concomitant with high- 
latitude Eocene cooling thus suggests that there were increasing SST 
gradients during the Eocene. Although SST trends are often recon- 
structed using multi-proxy studies, the difference in absolute SSTs 
between various proxy reconstructions can be considerably large”’>”’, 
even when measured on the same sediments. Despite the fact that 
multi-proxy approaches are generally encouraged in palaeoclimate 
studies, exclusion of such inter-proxy biases in latitudinal gradient 
reconstructions requires single-proxy SST records from around the 
world. Traditional calcite-based SST reconstructions are less 
suitable for this because calcite is only sparsely available in high- 
latitude sediments. The organic TEXg, and U*37 SST proxies, however, 
can be used independently of latitude and are, hence, suitable for 
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single-proxy SST gradient reconstructions. Moreover, they do not 
require critical assumptions about ancient sea-water chemistry, unlike 
5'°O and Mg/Ca. 

We compiled Eocene TEXge and U*3, SST reconstructions from a 
suite of sedimentary records from localities worldwide and noted 
increased Middle Eocene latitudinal SST gradients in both hemi- 
spheres (Fig. 3), relative to the Early Eocene. These SST gradients 
are in general agreement with those found for terrestrial mean annual 
temperatures, based on Early—Middle Eocene fossil leaves”’. Adding 
the bivalve-based SST reconstructions from Seymour Island’* to our 
organic proxy data suggests a strong gradient between 60° and 70° S, 
which contrasts with the small gradient between 60°S and the 
Equator (Fig. 3). A part of this large Southern Ocean SST gradient 
might be due to biases between organic and calcite proxies. A large 
part, however, may realistically reflect the influence of the cool 
Antarctic interior, which cooled the Antarctic shelf. In contrast to 
the continental South Pole, the Arctic region is an oceanic basin. 
Instead of amplifying the seasonal cycle, the Arctic Ocean probably 
moderated seasonal extremes in the northern high-latitude green- 
house. Hence, Palaeogene latitudinal temperature gradients, like 
those of today, would have exhibited a high degree of asymmetry 
between the two hemispheres. 

It has been suggested that the general warmth that characterized 
early Palaeogene climates was forced by high atmospheric greenhouse 
gas concentrations®. Concomitantly, the absence of polar ice sheets 
eliminated ice—albedo feedbacks in the Palaeogene greenhouse. The 
Middle—Late Eocene global cooling has been related to long-term 
atmospheric CO, decline, eventually resulting in the onset of major 
Antarctic glaciation around the Eocene/Oligocene boundary’. Our 
results imply that meridional temperature gradients markedly 
increased together with deep-sea cooling (Fig. 2)°. Although high 
latitudes cooled, tropical temperatures seem to have remained fairly 
stable throughout the Eocene (Fig. 3)’. This observation raises ques- 
tions concerning the precise role of decreasing atmospheric green- 
house gas concentrations in cooling the Eocene poles, as in theory~* 
they should have cooled tropical regions as well. The role of potential 
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Figure 3 | Early and Middle Eocene latitudinal SST gradients. Bivalve-shell 
ried @) (triangles), TEX, (squares) and U*3, (diamonds) SST reconstructions 
for the Early (orange) and mid-Middle (blue) Eocene. Data are from 
Seymour Island"* (a), the East Tasman Plateau (b), Deep Sea Drilling Project 
(DSDP) Site 277° (c), New Zealand’*’*® (d), DSDP Site 511° (e), ODP 

Site 1090° (f), Tanzania’ (g), ODP Site 925° (h), New Jersey’ (j, k; circle 
represents peak PETM SSTs*), ODP Site 336° (m), ODP Site 913° (n) and the 
Arctic Ocean*”*”? (p) (Supplementary Fig. 1). Error bars indicate the range 
of variation. Gradients represent second-order polynomials, excluding 
bivalve-shell data. Black and dashed lines represent the present-day zonally 
averaged latitudinal temperature gradient” and age-specific deep-sea 
temperatures, respectively (Fig. 2b, ref. 6). 
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high-latitude climate feedbacks involving, for example, differences in 
cloud/water vapour distribution might have been much more 
instrumental in the Middle Eocene climatic deterioration than previ- 
ously thought. Another potential positive-feedback mechanism for 
high-latitude cooling would be ice—albedo feedback. However, the 
presence of substantial Middle Eocene continental ice is still equivocal 
given the general warmth and overall absence of conclusive physical 
evidence. 
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Rapid ascent of rhyolitic magma at Chaiten volcano, 


Chile 


Jonathan M. Castro’ & Donald B. Dingwell? 


Rhyolite magma has fuelled some of the Earth’s largest explosive 
volcanic eruptions’. Our understanding of these events is in- 
complete, however, owing to the previous lack of directly observed 
eruptions. Chaitén volcano, in Chile’s northern Patagonia, erupted 
rhyolite magma unexpectedly and explosively on 1 May 2008 
(ref. 2). Chaiten residents felt earthquakes about 24 hours before 
ash fell in their town and the eruption escalated into a Plinian 
column. Although such brief seismic forewarning of a major 
explosive basaltic eruption has been documented’, it is unpreced- 
ented for silicic magmas. As precursory volcanic unrest relates to 
magma migration from the storage region to the surface, the very 
short pre-eruptive warning at Chaiten probably reflects very rapid 
magma ascent through the sub-volcanic system. Here we present 
petrological and experimental data that indicate that the hydrous 
rhyolite magma at Chaitén ascended very rapidly, with velocities of 
the order of one metre per second. Such rapid ascent implies a 
transit time from storage depths greater than five kilometres to 
the near surface in about four hours. This result has implications 
for hazard mitigation because the rapidity of ascending rhyolite 
means that future eruptions may provide little warning. 
Geophysical precursors to volcanic eruptions, such as volcano- 
tectonic earthquakes, tremor and deformation, all reflect magma 
migration beneath the volcano as the magma develops an ascent 
path*>. Such signals are crucial for volcano monitoring, and increas- 
ingly, as the source mechanisms of seismicity are identified, eruption 
forecasting®°. A critical unknown that has limited the accuracy of 
eruption forecasting is the rate of magma rise before an explosive 
eruption: this parameter controls not only degassing behaviour and 
flow rheology'®"', but also the timescale of accompanying precursory 
unrest and pre-eruptive warning’*. A vast majority of andesite and 
dacite volcanic eruptions were preceded by weeks to months of 
precursory unrest, consistent with long magma ascent times 
and correspondingly sluggish (some centimetres per second) rise velo- 
cities'*. This pattern was broken on 1 May 2008 when Chaitén vol- 
cano, Chile, erupted with almost no warning at all. This explosive 
rhyolite eruption, the first ever to be scientifically monitored’, pro- 
vides a unique opportunity to assess the conditions of pre-eruptive 
magma storage and ascent at rhyolite volcanoes. Of particular 
interest is the extreme suddenness of the eruption, because this implies 
that rhyolite is highly mobile in the shallow crust. Here we constrain 
the storage conditions and pre-eruptive ascent velocity of rhyolite 
magma at Chaitén by experimentally reproducing key mineralogical 
and textural characteristics of pumice erupted from the volcano. 
Pre-eruptive unrest at Chaiten began on 30 April 2008 at about 
20:00 h Chilean Local Time (CLT) when residents of Chaitén town, 
about 10km southwest of the volcano, felt earthquakes strong 
enough to knock objects off shelves. They first observed ash fall in 
their town on 1 May 2008 at roughly 21:00 h (CLT). Seismic activity 


continued through to 2 May 2008 when a large explosion and Plinian 
eruption column tore through a prehistoric obsidian dome in the 
Chaitén caldera. After a week of fluctuating Plinian and sub-Plinian 
activity, a new lava dome began to grow, and this activity is still 
continuing. The Plinian eruption plume distributed a broad swath 
of tephra throughout the Andes". We collected samples of this ash 
blanket from two sites located about 10 km east-southeast of the vent, 
and at another about 2km north of the vent. The tephra deposit 
comprises ash (~80% by volume), pumice lapilli and bombs 
(~17%), and obsidian fragments (~3%). 

The pumice lapilli are rhyolitic in composition (Table 1) and 
nearly aphyric (<1 vol.% crystals). Crystals comprise both micro- 
phenocrysts (0.5—-1.0 mm) and sparse microlites (<100 ttm), which 
we identified as plagioclase and biotite in about 10% of the sampled 
(n= 40) pyroclasts. The microphenocryst mineral population com- 
prises plagioclase, Fe-Ti oxides, orthopyroxene and biotite; however, 
many pumices are completely devoid of biotite. 

Plagioclase compositions are relatively uniform (~An4o_45), aside 
from a few crystal cores as calcic as Angg. These microphenocrysts are 
invariably rounded with zoning patterns (Fig. 1b) characterized by 
jagged compositional boundaries, indicating several cycles of dissolu- 
tion and growth’’. The lack of euhedral overgrowth rims on these 
plagioclase microphenocrysts suggests that they were in a state of 
resorption before eruption. Orthopyroxene is euhedral, and has a 
restricted compositional range (Enso_55; Table 1). Fe-Ti oxides include 
both titanomagnetite and rare ilmenite. We did not obtain reliable 
compositional analyses of biotite owing to its small size (1-2 um wide). 

Plagioclase and orthopyroxene microphenocrysts contain abundant 
small (<50 pm) rhyolitic glass inclusions (Fig. 1; Table 1). These inclu- 
sions contain large vesicles (>20 um) that could reflect the entrapment 
of volatile-saturated melt during crystal growth. The dominant volatile 
component of the glass is water (H,O and OH_), and the concentra- 
tions (~1.3 to 2.3 wt%) could reflect pre and syn-eruptive degassing 
through cracks and cleavage planes (Supplementary Information). 
The glass inclusions contain no detectable CO, (detection limit 
~10p.p.m.). 

Obsidian pyroclasts are mineralogically identical to the pumice; 
however, they contain larger plagioclase crystals (2-3 mm) and are 
generally more crystalline (~2-5 vol.%). The obsidians are low in 
H,O (0.5-1 wt%; Supplementary Information) and devoid of COj. It 
is not possible to prove that the obsidian pyroclasts are juvenile or 
lithic fragments derived from the obsidian dome in the Chaitén 
caldera, because this prehistoric lava dome is compositionally indis- 
tinguishable from the new magma’’. We therefore focus the rest of 
this analysis and discussion on the petrogenesis of the Chaitén 
pumice, which is undoubtedly a juvenile eruption product. 

The crystal complement in the Chaitén rhyolite records key 
information about pre-eruptive magma storage and ascent, as 
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Table 1| Representative compositions of the Chaitén pumice and microphenocryst mineral phases 


Component Matrix glasst Bulkt Glass inclusionst Plagioclasey Magnetite IImenite Orthopyroxenet 
Sample Ch-1-08 Ch-1-08 Ch-inc-1 Ch-plg2-08 C1-grn1 C1-grn2 Ch-opx1-08 
n n=76 n=5 n=10 n=20 n=9 n=9 n=15 
SiOz 76.1 (0.5) 75.6 (0.4) 76.1 (0.3) 59.2 (0.3) n.d. n.d. 48.3 (0.5) 
Al203 13.7 (0.2) 13.9 (0.2) 13.0 (0.4) 26.1 (0.5) 2.57 (0.05) 0.17 (0.01) 1.69 (0.2) 
TiO 0.13 (0.01) 0.14 (0.03) 0.5 (0.1) n.d. 8.7 (0.07) 45.6 (0.19) 0.13 (0.02) 
Fe203 1.27 (0.1) 1.5 (0.02) 1.42 (0.01) 0.16 (0.03) 90.3 (0.34) 54.6 (0.39) 31.2 (1.4) 
MgO 0.28 (0.01) 0.26 (0.1) 0.29 (0.02) n.d. 0.91 (0.03) 1.80 (0.04) 16.6 (0.3) 
MnO 0.06 (0.01) 0.05 (0.01) 0.06 (0.03) n.d. 0.54 (0.04) 0.86 (0.04) 1.79 (0.2) 
Cr2O03 n.d. n.d. n.d. n.d. 0.02 (0.02) 0.02 (0.01) n.d. 

CaO 1.41 (0.01) 1.46 (0.02) 1.10 (0.03) 8.13 (0.2) n.d. n.d. 0.34 (0.03) 
NazO 4.00 (0.1) 4.04 (0.02) 3.91 (0.2) 6.5 (0.06) n.d. n.d. n.d. 

K20 2.98 (0.04) 2.93 (0.05) 3.10 (0.1) 0.28 (0.02) n.d. n.d. n.d. 
P20. 0.04 (0.03) 0.06 (0.01) 0.01 n.d. n.d. n.d. n.d. 

SO>2 (p.p.m.) 31 (15) n.d. 500 n.d. n.d. n.d. n.d. 

Cl” (p.p.m.) 925 (66) n.d. 3550 n.d. n.d. n.d. n.d. 
Total 100.0 (0.45) 99.9 (0.77) 99.5 (1.02) 100.4 (0.53) 103.0 (0.34) 103.0 (0.44) 100.1 (0.40) 
+ EPMA; tBulk pumice X-ray fluorescence analysis. 

n.d: not detected. An:Ab:Or =58:40:02 

characteristics of mineral phases are all functions of pressure (P), tem- We performed petrological experiments'* ” on a powdered pumice 


perature (T) and the melt-H,O content. These intensive parameters 
may be constrained through petrological experiments’”’*. One com- 
plication is that the plagioclase microphenocrysts appear to have been 
unstable in the melt before eruption, and as such could be xenocrystic. 
As we show below, with the exception of rare calcic cores, plagioclase 
compositions are compatible with the rhyolite melt over a range of 
Py,0-T space. Furthermore, the abundance of glass inclusions in the 
microphenocrysts, whose major element compositions are identical to 
the pumice matrix glass, suggests that the plagioclase crystals are 
indeed primary. 


pyroclast over a range of Py,o—-T conditions (Supplementary 
Information). Given the explosive nature of the eruption, and the lack 
of CO, in melt inclusions, we assumed that the pre-eruption magma 
was water-saturated, and added just enough water to the pumice 
powder to achieve water-saturation. These experiments (Fig. 2) indi- 
cate that the crystal population is stable in hydrous rhyolite melt over a 
wide Py,o-T range (~50-200 MPa; ~780-850°C). Permissible 
magma storage conditions may be further bracketed by considering 
that the natural plagioclase (~Anyo45) and orthopyroxene 
(~Enso_ss) are together reproduced at about T<825°C and 


Figure 1| Backscattered electron micrographs of Chaitén pumice 
pyroclasts. a, Plinian pumice with a plagioclase microphenocryst set within 
microlite-free vesicular glass. b, Close-up view of the lower right side of the 
microphenocryst in a. Zones of different grey-value comprise oscillatory 
zoning in which the anorthite content varies from about Anyo in dark-grey 


zones to about An,; in brighter regions. The dark grey blob at the centre is a 
hydrous rhyolite glass inclusion. ¢, Plinian pumice with biotite microlite at 
the grain centre. d, Plinian pumice pyroclast fragment with an 
orthopyroxene microphenocryst (light-grey elongate). 
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Figure 2 | H,O-saturated phase relations in the Chaitén rhyolite. Mineral- 
in curves illustrate the stability limits of the natural microphenocryst 
minerals (plag, plagioclase; opx, orthopyroxene; Fe-Ti-ox, titanomagnetite). 
The symbol ‘))’ indicates ‘reversal’ experiments in which pre-annealed 
aliquots of crystal-rich material were subjected to higher temperature. Fine 
dashed lines are isopleths contouring the average An-content (mol.%) in 
plagioclase microlites and overgrowth rims. The En-contents (mol.%) are 
given for selected experiments. The red region demarcates a permissible 
magma storage Py,9—T zone based on matching the experimental phase 
assemblage, mineral compositions, and crystallinity with those features 
observed in the Plinian pumice. f9,, oxygen fugacity. 


P>120MPa. These conditions are consistent with temperatures 
calculated from compositions of titanomagnetite—ilmenite pairs in 
the pumice (~800 + 10°C)'. The pressure minimum (120 MPa) 
corresponds to a magma chamber depth of about 5km (+0.5km), 


825 °C 825 °C 
10 MPa he! 20 MPa h-1 


780 °C 
10 MPa ht 


Figure 3 | Montage of backscattered electron images collected on 
decompression experiments on the Chaitén pumice. Each image shows one 
or more plagioclase microphenocrysts with an overgrowth of new plagioclase, 
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assuming a range of country rock densities (2,300-2,700 kgm *). 
Higher magma storage pressures may be warranted, as the presence 
of biotite in some samples implies T< 800°C, and consequently 
higher Py,0, to stabilize plagioclase of the natural composition. 
Taking these observations into account, a range of magma storage 
conditions is possible in Py,9—T space (Fig. 2). 

All experiments conducted at Py,0 and/or a T less than the 
inferred storage conditions grew abundant microlites (~10- 
40 vol.%), in addition to euhedral overgrowth rims on plagioclase 
fragments over timescales of days (Supplementary Information). The 
nearly aphyric character of the natural pumice is in contrast to this, 
and therefore must indicate magma storage at near-liquidus condi- 
tions, and then very rapid ascent to the surface. That the plagioclase 
microphenocrysts remained rounded during magma ascent across a 
Py,0-T space that should have promoted plagioclase crystallization 
(Fig. 2) indicates that Chaiten magma rose faster than some threshold 
rate that would have allowed plagioclase rim growth. To quantify this 
rate, we performed decompression experiments” along temperature 
isotherms that bracket the range of possible storage conditions (780 
and 825 °C). The starting pressures (200 and 150 MPa) lie just above 
the plagioclase liquidus such that any microphenocryst fragments 
included in the annealed powder were partly resorbed before decom- 
pression. We equilibrated aliquots of the powdered pumice at the 
starting conditions for three days, and then decompressed charges to 
a final pressure of 30 MPa ina series of 5 MPa steps. The decompression 
intervals range from about 4 to 17h. The dwell periods (7.5—15 min) 
between decompression steps define linear decompression rates of 10, 
20 and 40 MPah '. 

Faceted plagioclase rims grew at 10 and 20 MPah | at both 780 
and 825 °C, but did not grow at 40 MPah ' in the 825 °C experiment 
(Fig. 3). Very thin (<3m), discontinuous rims grew in the 
40 MPah_', 780 °C run, but these rims did not grow in a subsequent 
experiment at 50MPah '. Therefore, the decompression rate that 
precludes the formation of plagioclase rims is similar at 780 and 


825 °C 
40 MPa h1 


appearing as a darker rim (a, b, d, e) and without overgrowth (c, f). At 780 °C 
and 825 °C, decompression rates higher than 20 MPah _' prevent the 
overgrowth of plagioclase owing to very short crystallization intervals. 
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825°C (~40 MPah '), and can be considered a minimum value for 
the Chaitén magma. This rate corresponds to an average ascent velo- 
city ofabout 0.5 ms_ ', considering a bubble-free magma overburden 
density of about 2,300kgm7°. 

We note that the limiting decompression rates are within the viscous 
regime and are therefore consistent with intact magma ascent 
presaging the explosive failure at higher decompression rates****. In 
other words, the decompression history recorded in the pumice 
probably represents pre-fragmentation ascent. The ascent rate inferred 
from decompression experiments (~0.5ms_ |) would correspond to 
strain rates (= ascent rate/conduit radius; r= 10-100m) of about 
10° ' to 10-*s_'. The viscosity of the Chaitén magma over a range 
of temperature and water contents”? (750-825 °C, 1-4 wt% H,0) is 
still at least one order of magnitude lower (~ 10°-10° Pas) than 
the critical values required to cause a glassy response of the magma 
(~10°-10!° Pas) at the implied shear strain rates”®. This result indicates 
that during much of its rise in the conduit, the Chaiten magma would 
not have been capable of autobrecciating as a result of shear’’, unlike 
silicic magma in its final stages of ascent in lava dome eruptions*”. 

Our results show for the first time that rhyolite magma can ascend 
very rapidly from depth (>5 km) before explosive fragmentation. The 
magma ascent timescale at Chaitén was brief (~4 h), and shorter than 
the period of the felt seismic unrest (~1 day) that preceded the erup- 
tion. That the earthquake swarm duration exceeded the pre-eruptive 
ascent timescale may reflect preparatory fracturing and the formation 
of the magma’s pathway to the surface**, or perhaps the swarm was an 
eruption trigger. Nonetheless, the brevity of pre-eruptive magma rise at 
Chaitén is clear evidence that near-liquidus, hydrous rhyolite is very 
fluid and, in all likelihood, capable of creating and transiting a magma 
transport system” on timescales that are difficult to prepare for, espe- 
cially in the absence of monitoring instruments. Our findings therefore 
emphasize the need to monitor rhyolite volcanoes that have undergone 
Holocene rhyolitic activity. In more densely populated regions this 
would be essential to avoid a major volcanic disaster. 


METHODS SUMMARY 


Glass and mineral compositions were analysed using Cameca SX-100 and JEOL 
JXA-8900R electron microprobes at the University of Munich and _ the 
Smithsonian Institution, respectively. Glasses were analysed with an accelerating 
voltage of 12-15 keV, a 10-20 tum beam, and 10 nA beam current; mineral analyses 
used a 3-5 [im beam and the same acceleration voltage and current. Standardization 
was performed on quartz (Si), anorthite (Ca), bytownite (Al), corundum (Al), 
microcline (K), albite (Na), hornblende (Fe, Mg), ilmenite (Ti, Fe), and chromite 
(Cr). Na was analysed first in all routines to minimize migration effects. 

We analysed very small (<10 im) plagioclase and orthopyroxene microlites 
by quantitative EDS ona FEI field emission scanning electron microscope (SEM) 
at the Smithsonian Institution. Analytical conditions consisted of 10-12 keV, a 
beam current of 0.5—1 nanoamps, spot size of ~1 4m, and 5mm working dis- 
tance. We calibrated the instrument against plagioclase, pyroxene, and glass 
standards the compositions of which were independently analysed by either 
electron probe microanalyser (EPMA) or wet chemistry. Reproducibility of 
the standard, experimental and natural mineral compositions with quantitative 
EDS was good, as reflected by errors of about +3.0 mol% An and £5.0 mol% En 
relative to the EPMA and wet chemical values. 

The water contents of obsidian chips and glass inclusions were determined 
with synchrotron-source Fourier transform infrared (FTIR) spectroscopy at the 
Lawrence Berkeley National Laboratory Advanced Light Source, according to 
techniques described in ref. 30. 

Hydrothermal phase equilibrium and decompression experiments were con- 
ducted in water-pressurized Waspaloy cold-seal vessels with nickel filler rods, 
according to methods described in refs 18 and 22. 


Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 
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METHODS 


Pumice samples were collected on 21 June 2008 at two locations approximately 
12 and 2km from the active vent. The samples in both locations were probably 
erupted on 3 May 2008, based on the sample locations relative to the trajectory of 
wind-dispersed ash plumes as recorded by NASA satellite photos. Serial sections 
of the air-fall deposit were sampled across its entire thickness. Samples from all 
horizons were analysed to establish whether any time-variations in chemistry or 
mineralogy occurred in the pumice. No such variations have been detected. 
Samples were sieved to determine the weight fractions of the different compo- 
nents. Mineral identification was carried out with optical and scanning electron 
microscopes. 

Fe-Ti oxide grains from the Plinian pumice were analysed by a Cameca SX-100 
EPMA. Grains were hand-picked from the 0.5—1 mm size fraction, using mag- 
netism and appearance of the crystals as observed under a binocular microscope. 
All grains were enclosed in a small selvage of vesicular glass. About 200 grains 
were selected in this manner, mounted in epoxy, and polished for EPMA ana- 
lysis. Analytical conditions for the oxide grains included 15 keV acceleration 
voltage, a 20 nA beam current and focused spot. We calibrated the EPMA with 
well-characterized mineral standards: haematite (Fe), ilmenite (Ti, Mn), chro- 
mite (Cr), corundum (Al) and periclase (Mg). Analyses of unknowns were 
interspersed with measurements of the mineral standards to check for instru- 
ment drift. 

Fe-Ti oxide compositional data was used to calculate the temperature and 
oxygen fugacity of the Chaitén rhyolite. Of the 200 Fe-Ti oxide grains analysed, 
only five titanomagnetite—ilmenite pairs were found to be in contact with one 
another, and hence suitable for estimating intensive parameters. We tested that 
these oxide pairs had been in equilibrium with each other by analysing Mg/Mn 
and using the empirical model of ref. 31. All pairs were found to be in equilib- 
rium based on their Mg/Mn values (Supplementary Information Table 4). We 
used the two-oxide solid solution model of ref. 21 to calculate temperature and 


i 

Glass, plagioclase and pyroxene compositions were analysed using Cameca SX- 
100 and JEOL JXA-8900R electron microprobes at the University of Munich and 
the Smithsonian Institution, respectively. Glasses were analysed with an accelerat- 
ing voltage of 12-15 keV, a 10-201m beam and 10nA beam current; mineral 
analyses used a 3-5 1m beam and the same acceleration voltage and current. 
Standardization was performed on quartz (Si), anorthite (Ca), bytownite (Al), 
microcline (K), albite (Na), hornblende (Fe, Mg), ilmenite (Ti, Fe), and chromite 
(Cr). Na was analysed first in all routines to minimize migration effects. 
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We analysed some very small (<10 um) plagioclase and orthopyroxene 
microlites by quantitative energy dispersive spectroscopy (EDS) on a FEI field 
emission SEM at the Smithsonian Institution. Analytical conditions comprised: 
10-12 KeV, a beam current of 0.5—1 nanoamps, spot size of ~1 um, and 5 mm 
working distance. We calibrated the instrument against plagioclase, pyroxene, 
and glass standards the compositions of which were independently analysed by 
either EPMA or wet chemistry. Reproducibility of the standard, experimental, 
and natural mineral compositions with quantitative EDS was good, as reflected 
by errors of about +3.0 mol% An and +5.0 mol% En relative to the EPMA and 
wet chemical values. 

The water contents of obsidian chips and glass inclusions were determined with 
synchrotron-source FTIR at the Lawrence Berkeley National Laboratory 
Advanced Light Source, according to techniques described in ref. 29. All measure- 
ments were made in transmission mode on doubly polished wafers ranging in 
thickness from 30 to 200 pm. 

Hydrothermal experiments on a powdered aliquot of Chaitén pumice were 
conducted in water-pressurized Waspaloy cold-seal vessels with nickel filler rods, 
according to methods described in refs 16 and 21. All experiments were run at an 
ambient oxygen fugacity of approximately NNO+1 log unit. Pumice powder, 
along with enough distilled water to ensure H,O-saturation (Py,0 = Potai) at the 
run conditions were loaded into Au tubes and welded shut. Capsules were 
weighed before and after the experiments to confirm that they remained closed 
during the runs. Experiments that underwent weight loss were discarded. 
Experiments were quenched first with air and then by immersion in a cold-water 
bath. 

Decompression experiments were first equilibrated at the starting conditions 
(200 MPa, 750°C; 150 MPa, 825°C) for 1 to 3days. Runs were then decom- 
pressed by a series of equal-sized steps (5 MPa) with a hand-operated pressure 
intensifier. The size of the intervening dwell periods at the intermediate pressures 
established three model decompression rates (10, 20 and 40 MPah™ 1) The final 
quench pressure in decompression experiments (30 MPa) was chosen as an 
intermediate to the pressures implied by the water contents of obsidian pyro- 
clasts and the plagioclase-hosted glass inclusions, assuming that these water 
contents reflect equilibrium solubility values’. 


31. Bacon, C. R. & Hirschmann, M. M. Mg/Mn partitioning as a test for equilibrium 
between coexisting Fe-Ti oxides. Am. Mineral. 73, 57-61 (1988). 

32. Silver, L. A., Ihinger, P. D. & Stolper, E. The influence of bulk composition on the 
speciation of water in silicate glasses. Contrib. Mineral. Petrol. 104, 142-162 
(1989). 
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Gene therapy for red-green colour blindness in adult 


primates 


Katherine Mancuso’, William W. Hauswirth, Qiuhong Li*, Thomas B. Connor’, James A. Kuchenbecker’, 


Matthew C. Mauck’, Jay Neitz’ & Maureen Neitz’ 


Red-green colour blindness, which results from the absence of 
either the long- (L) or the middle- (M) wavelength-sensitive visual 
photopigments, is the most common single locus genetic disorder. 
Here we explore the possibility of curing colour blindness using 
gene therapy in experiments on adult monkeys that had been 
colour blind since birth. A third type of cone pigment was added 
to dichromatic retinas, providing the receptoral basis for trichro- 
matic colour vision. This opened a new avenue to explore the 
requirements for establishing the neural circuits for a new dimen- 
sion of colour sensation. Classic visual deprivation experiments’ 
have led to the expectation that neural connections established 
during development would not appropriately process an input 
that was not present from birth. Therefore, it was believed that 
the treatment of congenital vision disorders would be ineffective 
unless administered to the very young. However, here we show 
that the addition of a third opsin in adult red-green colour- 
deficient primates was sufficient to produce trichromatic colour 
vision behaviour. Thus, trichromacy can arise from a single addi- 
tion of a third cone class and it does not require an early develop- 
mental process. This provides a positive outlook for the potential 
of gene therapy to cure adult vision disorders. 

Gene therapy was performed on adult squirrel monkeys (Saimiri 
sciureus) that were missing the L-opsin gene. In this species, some 
females have trichromatic colour vision whereas males are red—green 
colour blind’. Serotype 2/5 recombinant adeno-associated virus 
(rAAV) containing a human L-opsin gene under the control of the 
L/M-opsin enhancer and promoter (Fig. la) was delivered to the 
photoreceptor layer by subretinal injections (see Methods). 
Transcriptional regulatory elements were chosen to direct expression 
preferentially in M cones, but not short- (S) wavelength-sensitive 
cones or rods*. To provide the receptoral basis for trichromacy, 
animals received three 100-ul injections (containing a total of 
2.7 X 10'° viral particles) in each eye, which produced a relatively 
uniform, third submosaic of approximately 15-36% of M cones that 
coexpressed the transgene (Fig. le, f). 

Before treatment, monkeys were trained to perform a computer- 
based colour vision test, the Cambridge Colour Test**, which was 
modified for use with animals® (Fig. 2a). Dichromats who are missing 
either the L- or the M-photopigment fail to distinguish from grey: 
colours near the so-called ‘spectral neutral point’ located in the blue- 
green region of colour space (near dominant wavelength of 490 nm) 
and complementary colours near the “extra-spectral neutral point’ in 
the red-violet region (near dominant wavelength of —499 nm). 
Whereas trichromats have the four main hue percepts blue, yellow, 
red and green, dichromats only have two percepts, nominally blue 
and yellow. Before treatment, two dichromatic monkeys completed 
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Figure 1| rAAV2/5 vector produced functional L-opsin in primate retina. 
a, Molecular map. LCR, locus control region; PA;, polyadenylation signal; 
PP, proximal promoter; RHLOPS, recombinant human L-opsin cDNA; SD/ 
SA, splice donor/acceptor; TR, terminal repeats. b, Red light mf-ERG 
stimulus. c, mf-ERG 40 weeks after two injections (yellow circles) of a 
mixture of L-opsin- and GFP-coding viruses. Grey lines show borders of 
highest response. For comparison, the inset shows mf-ERG 16 weeks after 
injection; there was no reliable signal from L-opsin, unchanged from 
baseline. High responses in far peripheral retina were measured reliably and 
may have originated from offshoot of one of the injections. d, Fluorescence 
photographs from a similar retinal area as in ¢; grey lines from ¢ were copied 
in d. e, Confocal microscopy showed a mosaic pattern of GFP expression in 
5-12% of cones. Because GFP-coding virus was diluted to one-third 
compared to L-opsin virus, an estimated 15-36% of cones in behaviourally 
tested animals express L-opsin. f, Mf-ERG from a behaviourally tested 
animal 70 weeks after three injections of L-opsin virus. 
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Figure 2 | Pre-therapy colour vision and possible treatment outcomes. 

a, Colour-vision stimuli examples. b, Pre-therapy results, monkey 1. Hues 
tested are represented as dominant wavelengths rather than wu’, v' 
coordinates. If a hue could not be reliably distinguished at even the highest 
saturation, the extrapolated threshold approached infinity. c, Pre-therapy 
results, monkey 2. d, e, Possible experimental outcomes: monkeys could 
have a relative increase in long-wavelength sensitivity, but remain 
dichromatic (dashed lines, d); theoretical colour spectrum appearances for a 
dichromat and a possible ‘spectral shift’ are shown. Alternatively, 
dichromatic monkeys could become trichromatic. Results from a 
trichromatic female control monkey are plotted (dashed line, e). Error bars 
denote s.e.m.; n varied from 7-11. 
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three colour vision tests consisting of 16 hues (Fig. 2b, c). Four-to-six 
months were required to test all 16 hues; thus, baseline results 
represent testing conducted for more than a year. As predicted, 
before treatment monkeys had low thresholds (averaging <0.03 units 
in uy’, v’ colour space) for colours that represent blues and yellows to 
their eyes, but always failed to discriminate between the blue-green 
and the red-violet (dominant wavelengths of 490 nm and —499 nm, 
respectively) hues, with thresholds extrapolated from psychometric 
functions being orders of magnitude higher (Fig. 2b, c). Results were 
highly repeatable, with no improvement between the first and third 
tests, making us confident that the animals would not spontaneously 
improve in the absence of treatment. 

Co-expressing the L-opsin transgene within a subset of endo- 
genous M-cones shifted their spectral sensitivity to respond to long 
wavelength light, thus producing two distinct cone types absorbing in 
the middle-to-long wavelengths, as required for trichromacy. The 
spectral sensitivity shift was readily detected using a custom-built 
wide-field colour multifocal electroretinogram (mf-ERG) system 
(Fig. 1b, c, f) (see ref. 7 for details). In preliminary experiments, 
validity of the colour mf-ERG was tested using an animal that had 
received a mixture of the L-opsin-coding virus plus an identical virus, 
except that a green fluorescent protein (GFP) gene replaced the 
L-opsin gene. As reported previously, faint GFP fluorescence was 
first detected at 9 weeks post-injection, and it continued to increase 
in area and intensity over 24 weeks*. Although faint signs of GFP 
were first detectable at 9 weeks, L-opsin levels sufficient to produce 
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suprathreshold mf-ERG signals were still not present at 16 weeks 
post-injection (Fig. 1c, inset). After GFP fluorescence became robust, 
the red light mf-ERG, which indicates responses from the introduced 
L-opsin, showed highly increased response amplitudes in two areas 
(Fig. 1c) corresponding to locations of subretinal injections (Fig. 1d). 

The two dichromatic monkeys who participated in behavioural 
tests of colour vision were treated with L-opsin-coding virus only. 
Although the elongated pattern produced by two injections in Fig. 1c, d 
allowed mf-ERG validation, the treatment goal was to produce a 
homogeneous region, as resulted from three injections shown in 
Fig. 1f, in which the highest mf-ERG response covered about 80° of 
the central retina—roughly the area for which humans have good 
red-green discrimination. These results demonstrate that gene therapy 
changed the spectral sensitivity of a subset of the cones. A priori, there 
were two possibilities for how a change in spectral sensitivity might 
change colour vision behaviour. First, animals may have an increase in 
sensitivity to long-wavelength light, but if the neural circuitry for 
extracting colour information from the nascent ‘M + L cone’ submo- 
saic was absent, they would remain dichromatic—the hallmark of 
which is having two hues that are indistinguishable from grey 
(Fig. 2d). The spectral neutral point for individuals that have only S 
and M cones (for example, monkeys 1 and 2 pre-therapy) occurs near 
the dominant wavelength of 495nm. At the limit, an increase in 
spectral sensitivity would shift the monkeys’ neutral point towards that 
of individuals with only S and L cones, near the dominant wavelength 
of 505nm (Fig. 2d, dashed blue lines). The second, more engaging 
possibility was that treatment would be sufficient to expand sensory 
capacity in monkeys, providing them with trichromatic vision. In this 
case, the animals’ post-therapy results would appear similar to Fig. 2e, 
obtained from a trichromatic female control monkey. 

Daily testing continued after treatment. After about 20 weeks post- 
injection (Fig. 3a, arrow), the trained monkeys’ thresholds for blue- 
green and red-violet (dominant wavelengths of 490 and —499 nm, 
respectively; Fig. 3b, c) improved, reducing to an average of 0.08 units 
in u', v’ colour space, indicating that they gained trichromatic vision. 
This time point corresponded to the same period in which robust 
levels of transgene expression were reported in the squirrel monkey’. 
A trichromatic female monkey and untreated dichromatic monkeys 
were tested in parallel. As expected, the female had low thresholds for 
all colours, averaging <0.03 units in uv’, v’ colour space, but the 
untreated dichromats always failed to discriminate between domi- 
nant wavelengths of 490 nm (Fig. 3a, triangle) and —499 nm, indi- 
cating a clear difference between treated and untreated monkeys. 

Early experiments in which we obtained negative results served as 
‘sham controls’, demonstrating that acquiring a new dimension of 
colour vision requires a shift in spectral sensitivity that results from 
expression of an L pigment in a subset of M cones. Using similar 
subretinal injection procedures, we delivered fewer viral particles of 
an L-opsin-coding rAAV2/5 virus with an extra 146-base-pair (bp) 
segment near the splice donor/acceptor site that had been carried 
over from the cloning vector and that was absent in the GFP-coding 
rAAV2/5 virus. The 146-bp segment contained an ATG and a dupli- 
cate messenger RNA start site that may have interfered with expres- 
sion (see Methods). Three monkeys received injections of this vector, 
containing an average of 1.7 X 10’ virus particles per eye, and no 
reliable changes in spectral sensitivity were measured using the ERG. 
One animal was also tested behaviourally and his colour vision was 
unchanged from baseline 1 year after injection. In subsequent experi- 
ments reported here, we removed the extra 146-bp segment and also 
increased the amount of viral particles delivered per eye by approxi- 
mately 16-fold, to 2.7 X 10'*. Negative results from earlier injections 
demonstrated that the subretinal injection procedure itself does not 
produce changes in the ERG or in colour vision. 

The change in spectral sensitivity measured with the mf-ERG is 
necessary but not sufficient to produce a new colour vision capacity. 
For example, individuals with L but no M cones (termed deuteranopes) 
have a relatively enhanced sensitivity to red light, but they are still as 
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Figure 3 | Gene therapy produced trichromatic colour vision. a, Time 
course of thresholds for the blue-green confusion colour, dominant 
wavelength of 490 nm (circles), and a yellowish colour, dominant 
wavelength of 554 nm (squares). A logarithmic scale was used to fit high 
thresholds for the dominant wavelength of 490 nm; significant improvement 
occurred after 20 weeks. Enclosed data points denote untreated dichromatic 
monkey thresholds, dominant wavelengths of 490 nm (triangle) and 554 nm 
(diamond). b, ¢, Comparison of pre-therapy (open circles, solid line) and 
post-therapy (solid dots, dashed line) thresholds. Enclosed data points are 
dominant wavelength 490 nm thresholds when tested against a red-violet 
background (dominant wavelength of —499 nm); pink 

triangles show trichromatic female control thresholds. Error bars represent 
s.e.m.; 1 varied from 7-11. 


dichromatic as individuals with M but no L cones (protanopes) in that 
they are unable to distinguish particular ‘colours’ from grey. To verify 
that the behavioural change observed in animals expressing the L 
pigment transgene was not purely a shift in spectral sensitivity (see 
Fig. 2d), monkey 1 was also tested on dominant wavelengths of 496 
and 500 nm, and monkey 2 was tested on dominant wavelengths of 496 
and 507 nm. Together, these dominant wavelengths span the possible 
confusion points for deuteranopes and protanopes and for any inter- 
mediate dichromatic forms that could arise from expressing combina- 
tions of L and M pigments. As shown in Fig. 3b, c, both monkeys’ 
measured thresholds for these extra hues were similar to their thresh- 
olds for a dominant wavelength of 490 nm, demonstrating that they 
now lacked a spectral neutral point and have become truly trichromatic. 
Furthermore, treated monkeys were able to discriminate blue-green 
(dominant wavelength of 490 nm) when it was tested against a red- 
violet (dominant wavelength of —499 nm) background, instead of the 
grey background, indicating that the monkeys’ newly-acquired ‘green’ 
and ‘red’ percepts were distinct from one another. The treated monkeys’ 
improvement in colour vision has remained stable for more than 2 years 
and we plan to continue testing the animals to evaluate long-term 
treatment effects. 
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Classic experiments in which visual deprivation of one eye during 
development caused permanent vision loss’ led to the idea that inputs 
must be present during development for the formation of circuits to 
process them. From the clear change in behaviour associated with 
treatment, compared both between and within subjects, we conclude 
that adult monkeys gained new colour vision capacities because of 
gene therapy. These startling empirical results provide insight into 
the evolutionary question of what changes in the visual system are 
required for adding a new dimension of colour vision. Previously, it 
seemed possible that a transformation from dichromacy to trichro- 
macy would require evolutionary/developmental changes, in addi- 
tion to acquiring a third cone type. For example, L- and M-opsin- 
specific genetic regulatory elements might have been required to 
direct the opsins into distinct cone types” that would be recognized 
by L- and M-cone-specific retinal circuitry'®, and to account for 
cortical processing, multi-stage circuitry’ might have evolved spe- 
cifically for the purpose of trichromacy. However, our results 
demonstrate that trichromatic colour vision behaviour requires 
nothing more than a third cone type. As an alternative to the idea 
that the new dimension of colour vision arose by acquisition of a new 
L versus M pathway, it is possible that it exploited the pre-existing 
blue-yellow circuitry. For example, if the addition of the third cone 
class split the formerly S versus M receptive fields into two types with 
differing spectral sensitivities, this would obviate the need for neural 
rewiring as part of the process of adopting new colour vision. 

Some form of inherent plasticity in the mammalian visual system 
can be inferred from the acquisition of new colour vision, as was also 
demonstrated in genetically engineered mice’’; however, the point has 
been made that such plasticity need not indicate that any rewiring of 
the neural circuitry has occurred"’. Similarly, given the fact that new 
colour vision behaviour in adult squirrel monkeys corresponded to 
the same time interval as the appearance of robust levels of transgene 
expression, we conclude that rewiring of the visual system was not 
associated with the change from dichromatic to trichromatic vision. 

Treated adult monkeys unquestionably respond to colours that were 
previously invisible to them. The internal experiences associated with 
the marked change in discrimination thresholds measured here cannot 
be determined; therefore, we cannot know whether the animals experi- 
ence new internal sensations of red and green. Nonetheless, we do 
know that evolution acts on behaviour, not on internalized experi- 
ences, and we suggest that gene therapy recapitulated what occurred 
during evolution of trichromacy in primates. These experiments 
demonstrate that a new colour-vision capacity, as defined by new 
discrimination abilities, can be added by taking advantage of pre-exist- 
ing neural circuitry and, internal experience aside, full colour vision 
could have evolved in the absence of any other change in the visual 
system except the addition of a third cone type. 

Gene therapy trials are underway for Leber’s congenital amaur- 
osis'*"'*. Thus far, treatment has been administered to individuals 
who have suffered retinal degeneration from the disease. The experi- 
ments reported here are, to our knowledge, the first to use gene 
therapy in primates to address a vision disorder in which all photo- 
receptors are intact and healthy, making it possible to assess the full 
potential of gene therapy to restore visual capacities. Treatment 
allowing monkeys to see new colours in adulthood provides a striking 
counter-example to what occurs under conditions of monocular 
deprivation. For instance, it is impossible to restore vision in an adult 
who had grown up with a unilateral cataract. Future technologies will 
allow many opportunities for functions to be added or restored in the 
eye. Although some changes may produce outcomes analogous to 
monocular deprivation, we predict that others, like gene therapy for 
red-green colour blindness, will provide vision where there was 
previously blindness. 


METHODS SUMMARY 
Confocal microscopy. The animal in Fig. 1c, d succumbed to respiratory illness, 
unrelated to gene therapy, approximately 2 years and 3 months after injection. 
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The retina was fixed in 4% paraformaldehyde in PBS, and rinsed in PBS with 
10% and 30% sucrose. It was sequentially incubated with 10% normal donkey 
serum, rabbit monoclonal antibody to M/L-opsin (Chemicon, AB5405), and a 
Cy3 (red)-conjugated donkey anti-rabbit antibody (Jackson Immunoresearch). 
Confocal images were analysed using Image] (http://rsbweb.nih.gov). In the 
middle panel of Fig. le, magenta dots mark cone locations, and the red anti- 
M/L-opsin antibody staining was removed to show GFP-expressing (green) cells 
more clearly. 

Behavioural colour vision assessment. A three-alternative forced-choice model 
in which position and saturation of the stimulus was randomized between trials 
was used. Monkeys had to discriminate the location of a coloured patch of dots 
that varied in size and brightness, surrounded by similarly varying grey dots. 
When animals touched the coloured target, a positive tone sounded and a juice 
reward was given; the next stimulus appeared immediately. (The squirrel 
monkey shown in Fig. 2c is drinking a reward from a previous trial). If the wrong 
position was chosen, a negative tone sounded, and a 2-3-s ‘penalty time’ 
occurred before the next trial. 

For each hue, monkeys were tested on up to 11 different saturations ranging 
from 0.01 to 0.11 in uw’, v’ colour space (CIE 1976) and a threshold was calculated, 
which was taken as the saturation required to reach a criterion of 57% correct, the 
value determined to be significantly greater than chance (33% correct, P = 0.05); 
see ref. 6 for full details. All procedures were conducted in accordance with the 
guidelines of the US National Institutes of Health about the care and use of 
animals. 


Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 
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METHODS 

Viral vector. CHOPS2053 was a 2.1-kilobase (kb) fragment containing the locus 
control region and proximal promoter upstream of the human X-chromosome 
opsin gene array”'’. These elements (also known as pR2.1) have been shown to 
target transgene expression to mammalian L/M cones*'*. RHLOPS was a 1.2-kb 
fragment containing recombinant human L-opsin cDNA. A clone of the human 
L-opsin cDNA"’, known as hs7, was generously provided by J. Nathans. The 
QuickChange kit (Stratagene) was used to convert codon 180 so that it would 
encode a human L pigment maximally sensitive to 562 nm”°. The virus was made 
using the genome from rAAV serotype 2 and the capsid from serotype 5, and the 
preparation had 9 X 10'* DNase-resistant vector genome containing particles 
per ml. To prevent vector aggregation, 0.014% Tween-20 was added to the final 
vector preparation. A total of 2.7 X 10’° viral particles were injected per eye. 

An earlier version of the L-opsin-coding rAAV2/5 used in previous un- 
successful experiments contained an extra 146-bp segment between the splice 
donor/acceptor site and the translational start codon of the L-opsin gene that 
had been carried over from the cloning vector. Because we were concerned that 
this fragment may have interfered with transgene expression, a second version of 
L-opsin rAAV2/5 in which the extra 146 bp had been removed was used in later 
experiments described here. In addition to modifying the vector, we also 
increased the amount of viral particles delivered per eye by approximately 16- 
fold, from 1.7 X 10!” to 2.7 X 10°. Thus, we cannot conclude from this set of 
experiments what exact titre of viral particles was required to produce the effects 
on colour vision behaviour, or exactly what effects, if any, the extra 146 bp had on 
transgene expression in earlier unsuccessful attempts. 

The single-stranded DNA genome of conventional rAAV vectors, including 
rAAV2/5 used here, is devoid of Rep coding sequences. Thus, the vector genome 
is stabilized predominantly in an episomal form; however, the potential for 
integration exists*'. According to NIH guidelines, the viral vector used here is 
rated biosafety level 1 (BSL1), and animal biosafety level 1(ABSL1) meaning that 
no special precautions were required in handling the virus or animals treated 
with the virus. After treatment, squirrel monkeys had an increase in AAV anti- 
body titres, ranging from 4~12-fold. Antibody titres remained unchanged in 
untreated control animals who were housed with treated animals. 

Subretinal injections. Subretinal injections were performed by a vitreo-retinal 
surgeon (T.B.C.) using a KDS model 210 syringe pump under a stereomicro- 
scope. A 500-pl Hamilton Gastight (1750TTL) Luer Lock syringe was connected 


nature 


to 88.9 cm of 30 gauge teflon tubing with male Luer Lock adapters at both ends 
(Hamilton 30TF double hub), which was then connected to a 30-gauge Becton 
Dickinson Yale regular bevel cannula (ref 511258) that was manually bent to 
produce a 135° angle 1.5-mm from the tip. All components were sterilized before 
use. The syringe and tubing were filled with sterile lactated Ringers solution to 
produce a dead volume of approximately 210 jl. Just before injection, 300 pl of 
rAAV was withdrawn using a rate of 100 pl min”. 

Squirrel monkeys were anaesthetized using intramuscular injections of ketamine 
(15 mgkg”!) and xylazine (2 mg kg '); atropine (0.05 mg kg _') was also given to 
reduce airway secretions. The eye was dilated with 2-3 drops of tropicamide (1%) 
and treated with one drop each of betadine (5%), vigamox (0.5%) and propara- 
caine (1%). Subconjunctival injection of 0.1 ml lidocaine (2%) was given, and the 
anterior portion of the eye was exposed by performing a temporal canthotomy 
followed by limited conjuntival peritomy. Eyelids were held open with a speculum 
designed for premature infants. A temporal sclerotomy was made 1-mm posterior 
to the limbus with a 27-gauge needle, through which the injection cannula was 
inserted. Three subsequent 100-1 injections were made at different subretinal 
locations using an infusion rate of 1,060 pl min |. Post-procedure, 0.05 ml each 
of decadron (10 mg ml '), kenalog (40 mg ml!) and cephazolin (100 mg ml ') 
were injected subconjunctivaly; one drop each of betadine (5%) and vigamox 
(0.5%) and a 0.6-cm strip of tobradex (0.3% tobramycin, 0.1% dexamethasone) 
ointment were applied topically; 10-20ml of subcutaneous fluids (sterile 
lactated Ringers) was also given. Subsequent administration of steroids and 
analgesics was administered as needed post-procedure for potential inflammation 
or discomfort. 
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STING regulates intracellular DNA-mediated, type | 
interferon-dependent innate immunity 


Hiroki Ishikawa’, Zhe Ma! & Glen N. Barber! 


The innate immune system is critical for the early detection of 
invading pathogens and for initiating cellular host defence counter- 
measures, which include the production of type I interferon 
(IFN)'°. However, little is known about how the innate immune 
system is galvanized to respond to DNA-based microbes. Here we 
show that STING (stimulator of interferon genes) is critical for the 
induction of IFN by non-CpG intracellular DNA species produced 
by various DNA pathogens after infection’. Murine embryonic 
fibroblasts, as well as antigen presenting cells such as macrophages 
and dendritic cells (exposed to intracellular B-form DNA, the DNA 
virus herpes simplex virus 1 (HSV-1) or bacteria Listeria mono- 
cytogenes), were found to require STING to initiate effective IFN 
production. Accordingly, Sting-knockout mice were susceptible 
to lethal infection after exposure to HSV-1. The importance of 
STING in facilitating DNA-mediated innate immune responses 
was further evident because cytotoxic T-cell responses induced by 
plasmid DNA vaccination were reduced in Sting-deficient animals. 
In the presence of intracellular DNA, STING relocalized with 
TANK-binding kinase 1 (TBK1) from the endoplasmic reticulum 
to perinuclear vesicles containing the exocyst component Sec5 (also 
known as EXOC2). Collectively, our studies indicate that STING is 
essential for host defence against DNA pathogens such as HSV-1 
and facilitates the adjuvant activity of DNA-based vaccines. 

Nucleic acid species inadvertently generated by microbes after 
infection are potent inducers of cellular innate immune defences 
important for protection of the host'*. Although considerable pro- 
gress has been made into unravelling how RNA viruses induce type I 
IEN, required for triggering the production of anti-viral genes, little is 
known at the molecular level about the induction of IFN by DNA 
pathogens such as herpes simplex virus I (HSV-1) or by intracellular 
bacteria or parasites*’°. Toll-like receptor 9 (TLR9) is known to 
recognize CpG DNA to trigger IFN production in plasmacytoid 
dendritic cells (pDCs), and Z-DNA binding protein 1 (ZBP1, also 
known as DAI) was recently shown to be able to stimulate IFN tran- 
scription, but was found to be largely redundant in studies using 
DAI-deficient cells and mice'’'’. Recently, a DNA receptor AIM2 
was found to be important for ASC (also known as PYCARD)- 
dependent inflammasome mediated production of IL1B, but was 
not required for typeI IFN production'*”*. Thus, other innate sign- 
alling pathways that recognize intracellular non-CpG DNA species 
must exist to facilitate typeI IFN production. 

We previously demonstrated for the first time a role for STING 
(also referred to as TMEM173, MPYS and MITA), an endoplasmic 
reticulum (ER) resident transmembrane protein, in facilitating the 
production of type I IFN*’’”*. To evaluate the importance of STING 
in mediating DNA-induced innate immune responses, we used wild 
type (+/+) or Sting ‘~ low passage number mouse embryonic fibro- 
blasts (MEFs) and compared the induction of type I IFN (IFN) in 
response to a variety of DNA ligands. Our results indicated that 


STING was essential for inducing IFNB in response to transfected 
viral DNA (adenovirus, Ad5; herpes simplex virus, HSV-1 and -2), 
purified Escherichia coli DNA, calf thymus (CT) DNA, and interferon 
stimulatory DNA (ISD; double-stranded 45-base-pair oligonucleo- 
tides lacking CpG sequences) (Fig. la). Complete abrogation of IFNB 
production was also observed after transfection of synthetic double- 
stranded DNA (poly(dG-dC) epoly(dC-dG), hereafter referred to as 
poly(dGC:dGC)) in Sting “~ MEFs, and slight IFNB production was 
observed using poly(dAT:dAT), probably due to STING-independent, 
RIG-I (also known as DDX58)-dependent signalling*'””. The loss of 
STING did not significantly affect poly(I:C)-mediated type I IFN pro- 
duction, which is largely governed by MDAS (ref. 5). Concomitant 
analysis further indicated a marked reduction in IL6 production in 
Sting '~ MEFs compared to controls after similar DNA transfections 
(Fig. la). ISD-mediated production of Ifnb and Ifn2a messenger 
RNA was not detectable in Sting ‘~ MEFs compared to controls 
(Fig. 1b). Translocation of IRF3 or IRF7 was thus not observed in ISD- 
transfected Sting ‘~ MEBs, indicating that STING probably functions 
in mediating intracellular-DNA-triggered IFN production upstream 
of TBK1 (Fig. 1c and Supplementary Fig. 1). NF-«B signalling was also 
defective in Sting ‘" MEFs after exposure to transfected ISD (Sup- 
plementary Fig. 1). Given this, we next examined the importance of 
STING in facilitating intracellular-DNA-mediated production of 
typeI IFN in antigen presenting cells. This analysis indicated that 
Sting '~ macrophages transfected with ISD, or infected with the 
DNA pathogens HSV-1 or Listeria monocytogenes, were greatly defec- 
tive in their ability to manufacture type I IFN (Fig. 1d). However, the 
cleavage of pro-caspase 1 and production of active IL1B, which is 
AIM2-dependent, was unaffected by the loss of STING (Fig. le and 
Supplementary Fig. 1). Thus, STING functions independently of the 
AIM2 ‘inflammasome’ pathway. Further analysis also indicated that 
STING was required for efficient DNA-mediated production of type I 
IEN in granulocyte—macrophage dendritic cells (GM-DCs), as well as 
pDCs (FLT3-ligand-induced dendritic cells, FLT3-DCs) (Fig. lf, g). 
However, exogenous CpG DNA remained able to induce type I IFN in 
Sting '~ FLT3-DCs compared to controls, indicating that TLR9 func- 
tions independently of the STING pathway (Fig. 1g). The induction of 
IL6 in response to intracellular DNA was also reduced in Sting 
macrophages (Supplementary Fig. 1). However, HSV-1 and CpG 
DNA remained able to induce IL6 in Sting ‘~ macrophages, probably 
through TLR9-dependent signalling (Supplementary Fig. 1)". 
Furthermore, we noted that STING seemed to be essential for the 
production of typeI IFN by cytomegalovirus (CMV), vaccinia virus 
(VVAE3L) and baculovirus (Supplementary Fig. 1). STING therefore 
seems critical for intracellular-DNA-mediated production of typeI 
IEN in fibroblasts, macrophages, conventional dendritic cells as well 
as pDCs. 

We next evaluated the in vivo importance of STING in facilitating 
effective host defence against select virus infection. Principally, 
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Figure 1| STING is essential for intracellular DNA-mediated type | IFN 
production. a, MEFs were transfected with 1 ug ml_' of DNA ligands (with 
Lipofectamine 2000) for 16 h, and IFN or IL6 were measured. b, MEFs were 
transfected with ISD for 4h and Ifnb or Ifna2 mRNA levels were measured. 
c, MEFs treated as in b were stained with an antibody for IRF3 translocation. 
Original magnification, x40. d, Bone-marrow-derived macrophages were 
transfected with poly(dAT:dAT), poly(I:C) or ISD, or infected with HSV-1 
(multiplicity of infection (m.o.i.) 10) or Listeria (m.o.i. 10) for 16h, and 


Sting ‘~ or control mice were infected intravenously (i.v.) with 
HSV-1 and survival was monitored. The Sting-knockout mice died 
within 7 days of HSV-1 infection (Fig. 2a), whereas 80% of similarly 
infected wild-type mice survived. Significant amounts of HSV-1 were 
detected in the brain of infected Sting ’~ mice, but not in controls at 
5 days after infection (Fig. 2b). Analysis of serum from the Sting ‘~- 
infected animals indicated a profound defect in the production of 
type IIFN at 6h after infection, compared to infected control animals 
(Fig. 2c, d and Supplementary Fig. 2). RANTES and IL6 levels were 
similarly markedly reduced in Sting ‘~ mice at the same time point 
(Fig. 2e, f). Moreover, Sting ‘~ mice were found to be more sensitive 
to HSV-1 after intravaginal administration of HSV-1 (Supplemen- 
tary Fig. 2). This data indicates that STING is necessary, in vivo, for 
the effective production of type I IFN and is essential for efficient 
protection against HSV-1 infection. 

Because we had previously seen, in vitro, a defect in the ability of the 
negative-stranded virus vesicular stomatitis virus (VSV) to induce 
typeI IFN in the absence of STING, we next examined the in vivo 
importance of STING in protecting against VSV-related disease*. We 
observed that Sting ’~ animals infected with VSV was also signifi- 
cantly sensitive to lethal infection compared to controls (Fig. 2g). 
Defects in typeI IFN production were seen in Sting-knockout mice 
at early time points (6h), although less so at 24 h (Fig. 2h, i and 
Supplementary Fig. 2). Thus, STING is necessary for efficient, early 
induction of typeI IFN production and is required for protection 
against infection with the negative-stranded virus VSV, possibly by 
regulating the RIG-I and IPS-1 (also known as MAVS, VISA and 
CARDIF) pathway**"°. 

We did not observe a significant requirement for STING in facili- 
tating poly(I:C) or EMCV (encephalomyocarditis virus, a positive- 
stranded flavivirus)-mediated IFN transcription, indicating that 
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IFNB was measured. e, Macrophages were infected with HSV-1 for 16h and 
IL1B was measured. f, GM-colony stimulating factor (CSF)-induced 
dendritic cells (GM-DCs) were treated as in d, and IFN or IFN was 
measured after 16h. g, FLT3-stimulated dendritic cells were treated as in 

f (exogenous CpG oligodeoxynucleotides (ODN) (1 tg ml’) were also 
used). *P < 0.05, Student’s t-test. Error bars indicate s.d. ND, not 
determined. 


STING may not influence MDAS function (Fig. la and Supplemen- 
tary Fig. 2)*. However, it is known that some flaviviruses such as 
hepatitis C virus (HCV) can activate the RIG-I pathway, signalling 
which seems to be influenced by STING*”. In this regard, databank 
analysis indicated that the flaviviruses yellow fever virus (YFV) and 
Dengue virus encode a product NS4B that exhibits strong homology 
with the amino terminus of STING (amino acids 125-222) 
(Supplementary Fig. 3). This region was found to be critical for 
STING function (Supplementary Fig. 3). Various flaviviral NS4B 
products have been shown to localize to the ER of the cell and to 
suppress the induction of typeI IFN, although the mechanisms 
remain unclear™*. Our analysis here indicates that that NS4B was able 
to inhibit STING activity, probably by direct association (Fig. 2j-l 
and Supplementary Fig. 3). Thus, STING may be targeted by certain 
viruses for suppression. 

TBK1 has been shown to have an important role in mediating the 
adjuvant activity of DNA vaccines in vivo’*. TBK1 activation in res- 
ponse to plasmid DNA was found to occur in the absence of the DNA 
sensors TLR9 or DAI, indicating that other pathways exist to facilitate 
DNA-mediated immunization’*”*. To evaluate whether STING was 
involved in this signalling pathway, Sting ’~ or control mice were 
immunized with plasmid DNA encoding the ovalbumin gene. 
Although we noted normal B- and T-cell subsets in unstimulated 
Sting ‘~ animals, after immunization Sting ‘~ mice showed signifi- 
cantly less serum ovalbumin (OVA)-specific IgG compared to con- 
trols (Fig. 3a and Supplementary Fig. 4). Furthermore, spleen CD8* 
T-cell frequency and IFNy secretion was markedly reduced in 
Sting ‘~ mice after immunization, compared to wild-type mice 
(Fig. 3b, c). Because immunoglobulin responses to OVA peptide 
were normal, these data emphasize that the STING-governed DNA 
sensor pathway is essential for efficient DNA-vaccine-induced T-cell 
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Figure 2 | STING is required for effective in vivo host defence. a, Sting- 
deficient animals (Sting '~) or littermate controls (Sting'!'*) (n=7; 
approximately 8-weeks-of-age) were infected with HSV-1 (1 X 107 i.v.) and 
survival was monitored. b, Sting ‘~ or control mice were infected with HSV- 
1 as in a and brains were retrieved after 5 days for HSV-1 plaque assays. 
p.f.u., plaque-forming units. ¢, d, Serum from animals (n = 3) infected with 
HSV-1 (1 X 10’ i.v.) was analysed for IFNB (¢) or IFNa (d) production after 
6h. e, f, Serum from animals infected as in ¢ was analysed for RANTES 
(e) and IL6 (f) production. g, Sting ‘~ or control mice (n = 6) were infected 
with VSV (5 X 10’ iv.) and survival was monitored. h, i, Mice (n = 3) were 
treated as in g and IFNf (h) or IFNo (i) was measured after 6h. j, Increasing 
amounts of YFV NS4B were co-transfected into 293T cells with human 
STING or the amino terminus of RIG-I (ARIG-I, residues 1-284) and 
transfected IFNB promoter-driven luciferase (IFNB-Luc) was measured 
after 36 h. k, Immortalized MEFs were transfected with YFV NS4B for 24h, 
infected with VSVAM* (m.o.i. 1) for 16h, and IFNB was measured. I, 293 
cells were transfected with NS4B—HA for 36h and after 
immunoprecipitation (IP) with anti-haemagglutinin antibody, were 
analysed by western blot (WB) using anti-STING serum. *P < 0.05, 
Student’s t-test. Error bars indicate s.d. 


responses to antigen (Fig. 3 and Supplementary Fig. 4). Similar 
studies also indicated that STING had a key role in facilitating 
T-cell responses to the DNA virus vaccinia expressing ovalbumin 
(VV-OVA). Our data emphasizes the importance of STING in innate 
immune signalling processes required for DNA adjuvant activity 
(Fig. 3d). 

We previously demonstrated that STING is an ER resident protein 
and member of the TRAP (translocon associated protein) complex that 
can associate with RIG-I and the mitochondrial innate immune sig- 
nalling adaptor IPS-1 (refs 4, 26). Physical association of mitochondria 
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Figure 3 | STING is required for effective DNA-mediated adaptive immune 
responses. a, Sting ‘ or control (Sting*'*) mice (n = 5; approximately 
8-weeks-of-age) were immunized twice (100 ig i.m.) by electroporation with 
a DNA vaccine encoding ovalbumin. Serum was measured for anti-OVA 
IgG. b, c, Mice were treated as in a and spleen cps* IFNy~ cells were 
measured by fluorescence-activated cell sorting (FACS; b), and anti-OVA- 
specific IFNy production was measured by ELISA after stimulation of 
splenocytes using SIINFEKL peptide (c).d, Sting ‘~ mice or controls (n = 4; 
approximately 8-weeks-of-age) were infected with vaccinia expressing 
ovalbumin (VV-OVA; 5 X 10° iv.) and spleen anti-OVA-specific IFNy 
production was measured by ELISA. *P < 0.05, Student’s t-test. Error bars 
indicate s.d. All experiments were repeated twice. 


and the ER, referred to as mitochondria-associated ER membrane 
(MAM), is important for transmission of Ca?* to the mitochondria 
and for oxidative metabolism’’. We thus examined whether STING 
could associate with MAMs. First, we reconstituted haemagglutinin 
(HA)-tagged STING into Sting ‘~ MEFs to follow endogenous STING 
localization using a haemagglutinin antibody. This analysis confirmed 
that STING is predominantly associated with the ER as determined by 
calreticulin marker co-staining (Fig. 4a). Mitotracker co-staining also 
indicated that STING may co-localize with mitochondria associated 
with the ER (Fig. 1b). The association of endogenous STING with 
the ER was also confirmed using anti-STING serum (Supplementary 
Fig. 5). Fractionation analysis subsequently demonstrated that 
STING is associated with microsomes, a complex of continuous mem- 
branes that comprise the ER, Golgi and transport vesicles (Fig. 4c). 
Endogenous STING was found to fractionate with MAMs and mito- 
chondria fractions under non-stimulated conditions in MEFs (Fig. 4c). 
Calreticulin, known to be a chaperone involved in regulating the asso- 
ciation of the ER and mitochondria, was observed to fractionate 
similarly*’. This data may indicate that STING could associate with 
IPS-1 by MAM interaction*. Interestingly, after HSV-1 infection, 
STING was shown to become predominantly associated only with 
microsome fractions (Fig. 4c). To clarify these observations, we 
infected STING-HA MEFs with HSV-1, or transfected these cells with 
stimulatory ISD or negative-control single-stranded DNA (ssDNA). 
These results indicated that in response to HSV-1 infection or ISD 
transfection, STING translocated from the ER and predominantly 
congregated to perinuclear, non-ER microsome compartments in 
the cell (Fig. 4d and Supplementary Figs 5 and 6). Brefeldin A, but 
not chloroquine, blocked STING trafficking, indicating that STING 
locates from the ER via the Golgi to vesicles in the perinuclear region 
(Supplementary Fig. 5). This trafficking, in response to intracellular 
DNA, was similarly observed for TBK1, which we have previously 
shown to associate with STING? (Fig. 4e). Notably, in the absence of 
STING, TBK1 failed to relocate to perinuclear regions in response to 
ISD transfection (Supplementary Fig. 7). 

We further observed that in the presence of DNA, STING mostly 
localized with the early endosome marker protein EEA1 and recycling 
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Figure 4 | STING translocates from the ER to Sec5-containing vesicles. 
a, Sting '" MEFs, stably reconstituted with haemagglutinin-tagged mouse 
STING (mSTING-HA) were stained using haemagglutinin (green) and a 
calreticulin (red) antibody. b, STING-HA MEFs were stained for 
STING-HA (green), calreticulin (blue) or Mitotracker (red) and three- 
dimensional reconstruction images were taken. c, Immunoblot analysis of 
fractionation experiments of uninfected or HSV-1-infected (m.o.i. 10; 4h) 
MEFs. Endogenous STING was detected using an anti-STING antibody. 
Calreticulin detects ER, SigmalR detects MAM, and COXIV detects 
mitochondria. d, Haemagglutinin (green) or calreticulin (red) staining of 
mSTING-HA MEFs after treatment with transfected ISD (1 Lg ml '), 


endosome marker transferrin receptor (TFR; Fig. 4f and Supplemen- 
tary Fig. 6). TBK1 has also been demonstrated to associate with Sec5, a 
component of the excocyst 8 subunit complex that facilitates vesi- 
cular transport processes”. After intracellular DNA stimulation, 
STING was found to strongly colocalize with Sec5, which has also 
been demonstrated to associate in perinuclear endosome compart- 
ments (Fig. 4g)”. The RALB and Sec5 pathway has been previously 
shown to be required for efficient Sendai-virus-mediated type I IFN 
production”®. However, our data here indicates that STING and TBK1 
complexes may traffic to endosome compartments to associate with 
Sec5/exocyst components and facilitate the production of typeI IFN 
in response to intracellular DNA. To evaluate whether Sec5 also 
modulates the production of IFN in response to ISD, we suppressed 
Sec5 production in normal MEFs using RNA interference (RNAi). 
This study indicated that in the absence of Sec5, ISD-mediated IFN 
production was significantly impaired (Fig. 4h, i). A similar effect 
was observed after knockdown of Trapb (also known as Ssr2) and 
Sec61b, components of the TRAP complex (Fig. 4h, i and Sup- 
plementary Fig. 8). Our data thus indicates that intracellular DNA 
may induce STING to complex with TBK1 and traffic to Sec5- 
containing endosome compartments—events that facilitate the pro- 
duction of typeI IFN. 
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transfected ssDNA (1 Lig ml ') or HSV-1 infection as in ¢. e, mSTING-HA 
MEFs were transfected with or without ISD and cells were stained with 
haemagglutinin (green), calreticulin (blue) and a TBK1 (red) antibody. 

f, mSTING—HA MEFs were transfected as in e and stained with 
haemagglutinin (green) and a TFR (red) antibody. g, mSTING-HA MEFs 
were transfected as in e and stained with haemagglutinin (green) and a Sec5 
antibody (red). h, i, MEFs were treated with RNAi to Trapb, Sting or Sec5 for 
72h and transfected with ISD. IFNB mRNA and protein were measured at 4 
and 16h, respectively. *P < 0.05, Student’s t-test. Error bars indicate s.d. 
Scale bars, 10 um. 


In conclusion, we demonstrate that STING is essential for the 
recognition of intracellular DNA and efficient production of typeI 
IEN in all cell types examined. Loss of STING renders mice suscep- 
tible to lethal DNA virus infection (HSV-1). However STING also 
facilitates host defence responses to negative-stranded viruses such as 
VSV, plausibly through RIG-I and IPS-1-MAM translocon interac- 
tions. Although STING-independent, VSV-mediated typeI IFN- 
induction pathways clearly exist, they do not seem to be sufficient 
on their own to protect mice against lethal VSV infection. We con- 
clude that in response to intracellular DNA, STING and TBK1 com- 
plexes traffic to endosomal compartments to associate with exocyst 
components including Sec5, resulting in the induction of type I IFN. 


METHODS SUMMARY 


Details of mice, cells, viruses, plasmids, antibodies and reagents are given in the 
Methods. ELISA kits were obtained from following sources: murine IFNB and 
IFNa (PBL), murine IL6 (R&D systems or Quansys Biosciences), murine IL1B 
and IFNy (R&D systems), active NF-«B p65 (Active Motif) murine RANTES 
(Quansys Biosciences). 

DNA vaccine. Mice were immunized with a plasmid encoding OVA by intra- 
muscular (i.m.) electroporation (100 jig per mouse). The booster immunization 
was given within 4 weeks of the primary immunization. 
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Measurement of OVA-specific immune response. Spleen cells were extracted 
2 weeks after the second immunization and stimulated with synthetic peptide for 
OVA (H-2Kb SIINFEKL, Proimmune) at 10 pg ml |. After 3 days, the cell cul- 
ture supernatants were collected and analysed for the IFNy titre by ELISA (R&D 
systems). For intracellular IFNy staining, stimulated splenocytes were stained 
using FITC-labelled anti-CD8 antibody (BD). The serum anti-OVA antibody 
titre was measured by ELISA. Further details are given in the Methods. 
Confocal microscopy. For localization of Sec5 and LAMPI1, cells grown on 
coverslips were fixed in 80%/20% methanol/acetone at —20°C for 5 min. For 
EEA staining, cells were fixed with 4% paraformaldehyde in PBS for 15 min at 
37°C, and were permeabilized in 0.2% Triton X-100. For staining of other 
proteins, cells were fixed with 4% formaldehyde in DMEM for 15min at 
37 °C, and were permeabilized in 0.2% Triton X-100. For mitochondria staining, 
living cells were incubated with 300 nM of Mito Tracker Red (Invitrogen) for 
45 min at 37 °C. 

RNA interference. Chemically synthesized 21-nucleotide short interfering RNA 
(siRNA) duplexes were obtained from Dharmacon, Inc. The sequences of each 
siRNA oligonucleotide used in this study are given in the Methods. MEFs were 
transfected using an Amaxa nucleofector apparatus (program A-023) and 
Amaxa MEF nucleofector kit 1 according to the manufacturer’s instructions. 


Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 


Received 2 August; accepted 3 September 2009. 
Published online 23 September 2009. 


1. Palm, N. W. & Medzhitov, R. Pattern recognition receptors and control of adaptive 
immunity. [mmunol. Rev. 227, 221-233 (2009). 

2. Takeuchi, O. & Akira, S. Innate immunity to virus infection. Immunol. Rev. 227, 
75-86 (2009). 

3. Beutler, B. A. TLRs and innate immunity. Blood 113, 1399-1407 (2009). 

4. Ishikawa, H. & Barber, G. N. STING is an endoplasmic reticulum adaptor that 
facilitates innate immune signalling. Nature 455, 674-678 (2008). 

5. Kato, H. et al. Differential roles of MDA5 and RIG-I helicases in the recognition of 
RNA viruses. Nature 441, 101-105 (2006). 

6. Yoneyama, M. et al. The RNA helicase RIG-| has an essential function in double- 
stranded RNA-induced innate antiviral responses. Nature Immunol. 5, 730-737 
(2004). 

7. Kawai, T. et al. IPS-1, an adaptor triggering RIG-I- and Mda5-mediated type | 
interferon induction. Nature Immunol. 6, 981-988 (2005). 

8. Seth, R. B., Sun, L., Ea, C. K. & Chen, Z. J. Identification and characterization of 
MAVS, a mitochondrial antiviral signaling protein that activates NF-«B and IRF 3. 
Cell 122, 669-682 (2005). 

9. Meylan, E. et al. Cardif is an adaptor protein in the RIG-I antiviral pathway and is 
targeted by hepatitis C virus. Nature 437, 1167-1172 (2005). 

10. Xu, L. G. et al. VISA is an adapter protein required for virus-triggered IFN-B 
signaling. Mol. Cell 19, 727-740 (2005). 

11. Bauer, S., Pigisch, S., Hangel, D., Kaufmann, A. & Hamm, S. Recognition of nucleic 
acid and nucleic acid analogs by Toll-like receptors 7, 8 and 9. Immunobiology 213, 
315-328 (2008). 

12. Ishii, K. J. et al. TANK-binding kinase-1 delineates innate and adaptive immune 
responses to DNA vaccines. Nature 451, 725-729 (2008). 

13. Takaoka, A. et al. DAI (DLM-1/ZBP1) is a cytosolic DNA sensor and an activator of 
innate immune response. Nature 448, 501-505 (2007). 


792 


NATURE| Vol 461|8 October 2009 


4. Muruve, D. A. et al. The inflammasome recognizes cytosolic microbial and host 
DNA and triggers an innate immune response. Nature 452, 103-107 (2008). 

5. Roberts, T. L. et al. HIN-200 proteins regulate caspase activation in response to 
foreign cytoplasmic DNA. Science 323, 1057-1060 (2009). 

6. Hornung, V. et al. AIM2 recognizes cytosolic dsDNA and forms a caspase-1- 
activating inflammasome with ASC. Nature 458, 514-518 (2009). 

7. Fernandes-Alnemri, T., Yu, J. W., Datta, P., Wu, J. & Alnemri, E. S. AIM2 activates 
the inflammasome and cell death in response to cytoplasmic DNA. Nature 458, 
509-513 (2009). 

8. Biircksttiimmer, T. et al. An orthogonal proteomic-genomic screen identifies 
AIM2 as a cytoplasmic DNA sensor for the inflammasome. Nature Immunol. 10, 
266-272 (2009). 

9. Jin, L. et al. MPYS, a novel membrane tetraspanner, is associated with major 
histocompatibility complex class Il and mediates transduction of apoptotic 
signals. Mol. Cell. Biol. 28, 5014-5026 (2008). 

20. Zhong, B. et al. The adaptor protein MITA links virus-sensing receptors to IRF3 

transcription factor activation. Immunity 29, 538-550 (2008). 

21. Ablasser, A. et al. RIG-I-dependent sensing of poly(dA:dT) through the induction 
of an RNA polymerase IIl-transcribed RNA intermediate. Nature Immunol. 
doi:10.1038/ni.1779 (16 July 2009). 

22. Chiu, Y.H., Macmillan, J. B. & Chen, Z. J. RNA polymerase III detects cytosolic dna 
and induces type | interferons through the RIG-I pathway. Cell 138, 576-591 
(2009). 

23. Saito, T., Owen, D. M., Jiang, F., Marcotrigiano, J. & Gale, M. Jr. Innate immunity 
induced by composition-dependent RIG-| recognition of hepatitis C virus RNA. 
Nature 454, 523-527 (2008). 

24. Munoz-Jordan, J. L. etal. Inhibition of «/B interferon signaling by the NS4B protein 
of flaviviruses. J. Virol. 79, 83004-8013 (2005). 

25. Spies, B. et al. Vaccination with plasmid DNA activates dendritic cells via Toll-like 
receptor 9 (TLR9) but functions in TLR9-deficient mice. J. [mmunol. 171, 
5908-5912 (2003). 

26. Ménétret, J. F. et al. Single copies of Sec61 and TRAP associate with a 
nontranslating mammalian ribosome. Structure 16, 1126-1137 (2008). 

27. Hayashi, T., Rizzuto, R., Hajnoczky, G. & Su, T. P. MAM: more than just a 
housekeeper. Trends Cell Biol. 19, 81-88 (2009). 

28. Chien, Y. et al. RalB GTPase-mediated activation of the I«B family kinase TBK1 
couples innate immune signaling to tumor cell survival. Cell 127, 157-170 (2006). 

29. Spiczka, K. S. & Yeaman, C. Ral-regulated interaction between Sec5 and paxillin 

targets Exocyst to focal complexes during cell migration. J. Cell Sci. 121, 

2880-2891 (2008). 


Supplementary Information is linked to the online version of the paper at 
www.nature.com/nature. 


Acknowledgements We thank J. Yewdell for VV-OVA, B. Jacobs for VVAE3BL, 

K. Frueh for HCMV, M. Kobayashi for baculovirus, H. Horiuchi for the Sec5 
antibody, Y. C. Weh for Tbk1-knockout MEFs, and S. Nagata, T. Maniatis, J. Hiscott 
and N. Reich for plasmid constructs. This work was supported by NIH grant 
Al079336. 


Author Contributions H.I. and G.N.B. designed the research and analysed the data. 
H.|. performed most experiments. Z.M. performed experiments related to YFV 
NS4B, carried out exocyst RNAi studies and helped with experiments. G.N.B. wrote 
the paper. 


Author Information Reprints and permissions information is available at 
www.nature.com/reprints. Correspondence and requests for materials should be 
addressed to G.N.B. (gbarber@med.miami.edu). 


©2009 Macmillan Publishers Limited. All rights reserved 


doi:10.1038/nature08476 


METHODS 

Mice, cells, viruses and reagents. Sting-knockout mice on a 129SvEv X C57BL/6J 
background have been described previously*. MEFs, bone-marrow-derived macro- 
phages and GM-DCs were prepared as described previously*. To prepare FLT3-DCs, 
bone marrow cells were cultured in RPMI 1640 medium supplemented with 10% 
FBS, 50 UM 2-mercaptoethanol, 10 mM HEPES, pH 7.4, and 100 ng ml! human 
FLT3 ligand (Peprotech) for 8 days. 293T cells were obtained from the American 
Type Culture Collection (ATCC) and were maintained in DMEM medium supple- 
mented with 10% FBS. VSV (Indiana strain), VSVAM and EMCV were described 
previously’. HSV-1 (KOS strain) and Listeria monocytogenes (10403 serotype) were 
obtained from ATCC. Vaccinia virus encoding chicken ovalbumin (VV-OVA), 
vaccinia virus E3L deletion mutant (VVAE3L), human cytomegalovirus (AD169 
strain), and baculovirus (Autographa californica M nucleopolyhedrovirus) were gifts 
from J. Yewdell, B. Jacobs, K. Frueh and M. Kobayashi, respectively. Gnnomic DNA 
was obtained from following sources: HSV-1, HSV-2, adenovirus type 5 (ATCC); 
E. coli, and calf thymus (Sigma). Poly(dAT:dAT) and poly(I:C) were obtained from 
Amersham Biosciences. Poly(dGC:dGC) and poly(dA) were obtained from Sigma. 
CpG ODN (ODN 1585) was obtained from Invivogen. For stimulation of cells, 
genomic DNA, polydeoxynucleotides or poly(I:C) were mixed with Lipofectamine 
2000 (Invitrogen) at a ratio of 1:1 (v/w), and then added to cells at a final concen- 
tration of | 1g ml’. LPS was obtained from Invivogen. Brefeldin A and chloroquine 
were obtained from Sigma. 

Plasmids. YFV NS4B sequence was amplified by PCR using pYFM5.2 encoding 
the complete YFV-17D sequence as a template, and was cloned into a pcDNA3 
(Invitrogen) plasmid to generate carboxy-terminally haemagglutinin-tagged 
expression construct. C-terminally haemagglutinin-tagged STINGASP (A1-36 
amino acids) and STINGATMS (A153-173 amino acids) were amplified by PCR 
and cloned into a pcDNA3 plasmid. The expression plasmid containing chicken 
ovalbumin (OVA) complementary DNA was constructed by cloning of PCR- 
amplified OVA cDNA into pCDNA3. Expression plasmids encoding haemag- 
glutinin-tagged murine STING (mSTING-HA), Flag-tagged ARIG-I (amino 
acids 1-284), AMDAS (amino acids 1-349) and IRF-7 were described previ- 
ously*. p110-Luc (IFNB-Luc) was obtained from T. Maniatis. pUNO-hsaIRF3 
(IRF3SA) and pUNO-hsaIRF7A (IRF7SA) were obtained from Invivogen. 
pCMV-SPORT6 containing murine DAI was obtained from Open Biosystems. 
Primers. The following primers were used for cloning: YFV NS4B forward, 
5'- GGGGTACCATGAACGAGCTAGGCATGCTGGAG-3’; YFV NS4B reverse, 
5'- CCGCTCGAGCCGGCGTCCAGTTTTCATCTTC-3’; STINGASP forward, 
5'- CCCAAGCTTGCCGCCACCATGCTAGGAGAGCCACCAGAGCAC-3’; STINGASP 
reverse, 5'- CCGCTCGAGAGAGAAATCCGTGCGGAGAG-3’; OVA forward, 5’-ATGG 
GCTCCATCGGCGCAGCAA-3’; OVA reverse, 5’-TTAAGGGGAAACACATCTGCC-3’. 
Antibodies and ELISA. Rabbit polyclonal antibody against STING was described 
previously*. The antibody against STING-C was generated by immunizing rabbit 
with recombinant glutathione S-transferase (GST)—hSTING-C (amino acids 
173-379) produced in E. coli. Rabbit polyclonal antibody against Sec5 was a gift 
from H. Horiuchi. Other antibodies were obtained from following sources: 
caspase-1 p10 (Santa Cruz Biotechnology), calreticulin (ab14234; Abcam), 
Sigmal receptor (ab53852; Abcam), TBK1 (EP611Y, Abcam), COXIV 
(ab16056, Abcam), rabbit polyclonal HA (ab9110; Abcam), transferrin receptor 
(H68.4; Invitrogen), mouse monoclonal haemagglutinin (Sigma), Flag (M2; 
Sigma), IRF3 (ZM3; Zymed), TGN46 (ab16059; Abcam), giantin (ab24586; 
Abcam), EEA1 (no.2441; Cell Signaling), LAMP1 (NB120; Novus Biologicals) 
and Sec61B (Upstate). ELISA kits were obtained from following sources: murine 
IFNB and IFNa (PBL), murine IL6 (R&D systems or Quansys Biosciences), 
murine IL1B and IFNy (R&D systems), active NF-KB p65 (Active Motif), and 
murine RANTES (Quansys Biosciences). 

Real-time PCR. Fluorescence real-time PCR analysis was performed using a 
LightCycler 2.0 instrument (Roche Molecular Biochemicals) and the following 
TaqMan Gene Expression Assays (Applied Biosystems): IFNB (Mm00439546_s1), 
IFNa2 (Mm00833961_s1) and TRAPB (Mm00481383_m1). Relative amounts of 
mRNA were normalized to the 18S ribosomal RNA levels in each sample. 
Reporter analysis. 293T cells seeded on 24-well plates were transiently trans- 
fected with 50 ng of the luciferase reporter plasmid together with a total of 600 ng 
of various expression plasmids or empty control plasmids. As an internal con- 
trol, 10 ng pRL-TK was transfected simultaneously. Then, 24 or 36h later, the 
luciferase activity in the total cell lysate was measured. 

DNA vaccine. Mice were immunized with a plasmid encoding OVA by i.m. 
electroporation (100 ug per mouse). The booster immunization was given 
within 4 weeks of the primary immunization. 
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Measurement of OVA-specific immune response. Spleens were extracted 
2weeks after the second immunization and 5 X 10° spleen cells were seeded on 
96-well plates and then stimulated with synthetic peptide for OVA (H-2Kb 
SIINFEKL, Proimmune) at 10 ug ml '. After 3 days, the cell culture supernatants 
were collected and analysed for the IFNy titre by ELISA (R&D systems). For intra- 
cellular IFNy staining, stimulated splenocytes were stained using FITC-labelled 
anti-CD8 antibody (BD). After washing, cells were fixed and permeabilized. 
Then cells were stained using phycoerythrin (PE)-labelled anti-IFNy antibody 
(BD). Flow cytometric analysis was performed on a FACScaliber instrument 
(BD). The serum anti-OVA antibody titre was measured by ELISA. In brief, 
96-well plates were coated with an OVA protein at 1 pgml' and then blocked 
with PBS containing 5% skimmed milk. Plates were washed and overlaid with 
serially diluted serum for 1 h at room temperature. After washing, antibodies were 
detected using goat anti-mouse IgG conjugated to horseradish peroxidase (Jackson 
Immuno Research). After further washing, the plates were stained using 3,3’,5,5’- 
tetramethylbenzidine (TMB, Sigma) as a substrate. The reaction was stopped with 
1 MH,SO, and the absorbance was measured. Antibody titres were expressed as the 
reciprocal of the endpoint dilution after background subtraction. 

Fractionation. MAM, mitochondria and microsomes were isolated from 
Sting '~ MEEs stably transfected with mSTING-HA plasmid as previously 
described*®. In brief, cells were washed in PBS and pelleted by centrifugation 
at 1,000g for 10 min. The pellet was resuspended in sucrose homogenization 
buffer (0.25 M sucrose, 10 mM HEPES, pH 7.4), and cells were lysed by using 
a dounce homogenizer. Lysed cells were centrifuged at 500g for 10 min, and the 
supernatant was collected. The supernatant was then centrifuged at 10,300g for 
10 min to separate the crude microsomal (microsome and cytosol) from the 
crude mitochondrial (MAM and mitochondria) fraction, and the crude micro- 
somal fraction (supernatant) was subjected to ultracentrifugation at 100,000g for 
60 min. The crude mitochondrial fraction (pellet) was resuspended in ice-cold 
mannitol buffer A (0.25 M mannitol, 5 mM HEPES, 0.5 mM EDTA) and layered 
on top of a 30% Percoll in mannitol buffer B (0.225 M mannitol, 25 mM HEPES, 
1mM EDTA). Mitochondria and MAM fractions were separated by ultracen- 
trifugation at 95,000g for 65min. Both isolated fractions were diluted with 
mannitol buffer B and centrifuged at 6,300g for 10 min. The supernatant of 
MAM centrifugation was further separated by centrifugation at 100,000g for 
60 min and the pellet was used for the MAM fraction, whereas the pellet of the 
mitochondria centrifugation was used as the mitochondria fraction. All of the 
fractions were resuspended in mannitol buffer B. 

Confocal microscopy. For localization of SecS and LAMP1, cells grown on 
coverslips were fixed in 80%/20% methanol/acetone at —20 °C for 5 min. For 
EEAI staining, cells were fixed with 4% paraformaldehyde in PBS for 15 min at 
37°C, and were permeabilized in 0.2% Triton X-100. For staining of other 
proteins, cells were fixed by 4% formaldehyde in DMEM for 15 min at 37°C, 
and were permeabilized by 0.2% Triton X-100. For mitochondria staining, living 
cells were incubated with 300 nM of Mito Tracker Red (Invitrogen) for 45 min at 
37 °C. Fixed and permeabilized cells were pre-incubated with 0.1% BSA in PBS 
and were incubated with primary antibodies. Cells were then incubated with 
secondary antibodies conjugated with FITC, Cy3 or Cy5 (Sigma). 

RNA interference. Chemically synthesized 21-nucleotide siRNA duplexes were 
obtained from Dharmacon, Inc. The sequences of each siRNA oligonucleotide 
used in this study are follows: murine Trapb siRNA, 5'-UGAAAGAGAGGAC 
GGGUUAUU-3’; murine Sec5 siRNA, 5'-AGAAGUAUUAGGUCGGAAA-3’, 
5’-UCAACGUACUUCAGCGAUU-3’, 5’-CAGCAGAGAUUACACGUCA-3’, 
5'-GUGAGUGGCUUGCGCAGUA-3’; murine Sting siRNA, 5’-CCAACAGC 
GUCUACGAGA-3’; murine Sec61b siRNA, 5’-GCAAGUACACGCGAUCA 
UA-3', 5'-CAUCGCUGCUGUAUUUAUG-3’, 5'-CCACUGUUCGGCAGAGA 
AA-3', 5'‘-GGCGAUUCUACACGGAAGA-3’. Control siRNA was obtained 
from Dharmacon (D-001206—01-80). MEFs were transfected by using an 
Amaxa nucleofector apparatus (program A-023) and Amaxa MEF nucleofector 
kit 1 according to the manufacturer’s instructions. L929 cells were transfected 
using Lipofectamine RNAiMAX (Invitrogen). At 72h after transfection, cells 
were used for further experiments. 

Statistics. Student’s t-test was used to analyse data. 


30. Mavinakere, M. S., Williamson, C. D., Goldmacher, V. S. & Colberg-Poley, A. M. 
Processing of human cytomegalovirus UL37 mutant glycoproteins in the 
endoplasmic reticulum lumen prior to mitochondrial importation. J. Virol. 80, 
6771-6783 (2006). 
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Prohibitin couples diapause signalling to 
mitochondrial metabolism during ageing in C. elegans 


Marta Artal-Sanz! & Nektarios Tavernarakis' 


Marked alterations in cellular energy metabolism are a universal 
hallmark of the ageing process'. The biogenesis and function of 
mitochondria, the energy-generating organelles in eukaryotic cells, 
are primary longevity determinants. Genetic or pharmacological 
manipulations of mitochondrial activity profoundly affect the life- 
span of diverse organisms’. However, the molecular mechanisms 
regulating mitochondrial biogenesis and energy metabolism during 
ageing are poorly understood. Prohibitins are ubiquitous, evolutio- 
narily conserved proteins, which form a ring-like, high-molecular- 
mass complex at the inner membrane of mitochondria’. Here, we 
show that the mitochondrial prohibitin complex promotes longe- 
vity by modulating mitochondrial function and fat metabolism in 
the nematode Caenorhabditis elegans. We found that prohibitin 
deficiency shortens the lifespan of otherwise wild-type animals. 
Notably, knockdown of prohibitin promotes longevity in diapause 
mutants or under conditions of dietary restriction. In addition, 
prohibitin deficiency extends the lifespan of animals with compro- 
mised mitochondrial function or fat metabolism. Depletion of pro- 
hibitin influences ATP levels, animal fat content and mitochondrial 
proliferation in a genetic-background- and age-specific manner. 
Together, these findings reveal a novel mechanism regulating mito- 
chondrial biogenesis and function, with opposing effects on energy 
metabolism, fat utilization and ageing in C. elegans. Prohibitin may 
have a similar key role in modulating energy metabolism during 
ageing in mammals. 

The mitochondrial prohibitin complex comprises two subunits 
(PHB-1 and PHB-2) that assemble at the inner mitochondrial mem- 
brane’. Prohibitins have been implicated in several important cellular 
processes such as mitochondrial biogenesis and function, signalling, 
transcriptional control, cell death and replicative senescence. In addi- 
tion, prohibitins have been associated with various types of cancer 
(reviewed in ref. 5). Little is known about the role of prohibitin in 
chronological ageing. We examined the requirement for prohibitin 
during ageing in C. elegans. Prohibitin genes are widely expressed in 
animal tissues throughout development and during adulthood. 
Green fluorescent protein (GFP)-tagged PHB-1 and -2 co-localize 
in mitochondria (Supplementary Fig. 1; ref. 6). Elimination of either 
PHB-1 or PHB-2 by RNA interference (RNAi; Supplementary Fig. 2) 
disrupts the mitochondrial prohibitin complex and causes early 
embryonic lethality’. Homozygous mutants harbouring a null phb-1 
allele become gametogenesis-defective sterile adults due to maternal 
effect (see Methods). 

Post-embryonic RNAi knockdown of either phb-1 or phb-2 shortens 
the lifespan of otherwise wild-type worms (Fig. la and Supplementary 
Table 1). In sharp contrast, prohibitin deficiency markedly extends the 
lifespan of long-lived daf-2 mutants (Fig. 1b and Supplementary Table 
1. The insulin/insulin-like growth factor (IGF) receptor DAF-2 is a 
component of a signalling pathway regulating diapause entry (dauer 
larva formation)’. Longevity conferred by daf-2 mutations requires 


the DAF-16/FOXO transcription factor (reviewed in ref. 8). Loss of 
DAF-16 fully suppresses the exceptional longevity of prohibitin- 
depleted, daf-2 mutants (Fig. lc and Supplementary Table 1). 

The transforming growth factor-B (TGF-B) signal transduction 
pathway also controls diapause and ageing’. The daf-7 and daf-4 genes 
encode a TGF-B homologue and the type II, transmembrane TGF-B 
receptor serine/threonine kinase, respectively'*”’. Prohibitin depletion 
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Figure 1| Prohibitin deficiency markedly extends the lifespan of dauer- 
defective C. elegans mutants while shortening the lifespan of otherwise 
wild-type animals. The percentage of animals remaining alive is plotted 
against animal age. Assays were carried out at 20 °C. Combined lifespan data 
from independent experiments are given in Supplementary Table 1. 

a, Depletion of either PHB-1 or PHB-2 by RNAi in wild-type (N2) animals 
shortens lifespan. b, Prohibitin knockdown further extends the lifespan of 
long-lived, insulin signalling-defective daf-2(e1370) mutant animals. c, The 
longevity of prohibitin-depleted, daf-2(e1370) mutants is dependent on the 
transcription factor DAF-16/FOXO. d, Knockdown of either the phb-1 or the 
phb-2 gene extends the lifespan of daf-7(e1372) mutant animals, defective in 
TGE-B signalling. e, Survival curves of dauer-defective daf-4 mutants 
subjected to RNAi with either phb-1 or phb-2. f, Survival curves of dauer- 
defective daf-11 mutants subjected to RNAi with either phb-1 or phb-2. 
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extends the lifespan of both daf-7 and daf-4 mutant animals (Fig. 1d, e 
and Supplementary Table 1). Furthermore, knockdown of phb-1 or 
phb-2 extends the lifespan of animals carrying a lesion in the daf-11 
gene, which encodes a transmembrane guanylate cyclase that functions 
via both the insulin/IGF and the TGF-f pathways to modulate dauer 
formation” (Fig. 1fand Supplementary Table 1). Thus, depending on 
the genetic background, prohibitin function has opposing effects on 
C. elegans ageing. Although depletion of prohibitin compromises sur- 
vival in wild-type animals, it substantially extends the lifespan of 
mutants defective in either of the two diapause signalling pathways. 
Both PHB-1 and PHB-2 proteins localize in mitochondria, where 
they form a high-molecular-mass complex (Supplementary Fig. 1; 
ref. 6). We investigated the role of prohibitin during ageing in animals 
carrying mutations that affect the mitochondrial electron transport 
chain. Knockdown of phb-1 or phb-2 extends the lifespan of gas-1 
mutants (Fig. 2a and Supplementary Table 1). The gas-1 gene encodes 
a homologue of the 49-kDa iron-sulphur subunit of the mitochon- 
drial electron transport chain complex I. Similarly, prohibitin deple- 
tion extends the lifespan of nematodes with lesions in the mev-1 and 
isp-1 genes (Fig. 2b, c and Supplementary Table 1). mev-1 and isp-1 
encode the succinate dehydrogenase cytochrome b, a component of 


a gas-1(fc21) b mev-1(kn1) 


—— Control RNAi 
—— phb-1(RNAi) 
/—t— phb-2(RNAi) 


—— Control RNAi 
—— phb-1(RNAi) 
—— phb-2(RNAi) 


254 254 
0 T T 0 T 
0 10 20 30 40 £50 0 10 20 30 40 
Time (days) Time (days) 
c isp-1(qm150) d clk-1(e2519} 
100 100 ( ) 


—— Control RNAi 
—— phb-1(RNAi) 
—— phb-2(RNAi) 


—— Control RNAi 
— phb-1(RNAi) 
— phb-2(RNAi) 


Percentage survival 
a 
2 


0 T 7 0 T 
0 25 50 75 100 0 10 20 30 40 50 
Time (days) Time (days) 
e eat-2(ad465) 
100: 


—— Control RNAi 
—— phb-1(RNAi) 
— + phb-2(RNAi) 


Percentage survival 
ie) a 
a oO 


0 10 2 30 40 50 
Time (days) 


Figure 2 | Prohibitin deficiency further extends the lifespan of 
mitochondrial and dietary-restricted C. elegans mutants. Survival curves of 
mutant animal populations subjected to phb-1 or phb-2 RNAi are shown. 
a, Knockdown of prohibitin in mutants carrying a lesion in the gas-1 gene, 
which encodes a homologue of the 49-kDa iron-sulphur protein fraction 
subunit of the mitochondrial NADH:ubiquinone-oxidoreductase, a 
component of the mitochondrial electron transport chain complex I. 

b, Knock-down of prohibitin in mev-1 mutants, which lack succinate 
dehydrogenase cytochrome b, a component of the mitochondrial electron 
transport chain complex II. ¢, Knock-down of prohibitin in isp-1 mutants, 
deficient for the Rieske iron-sulphur protein (ISP), a subunit of the 
mitochondrial electron transport chain complex III. d, Knock-down of 
prohibitin in long-lived animals carrying a mutation in the clk-1 gene 
encoding a mitochondrial ubiquinone biosynthesis enzyme. e, Knock-down 
of prohibitin in long-lived, dietary-restricted eat-2 mutants. The percentage 
of animals remaining alive is plotted against animal age. Lifespan values are 
given in Supplementary Table 1; assays were carried out at 20 °C. 
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complex II, and the Rieske iron-sulphur protein, a subunit of complex 
Ill, respectively. In addition, prohibitin knockdown extends the life- 
span of clk-1 mutant animals (Fig. 2d and Supplementary Table 1), 
which are defective in the biosynthesis of ubiquinone, an essential 
component of the electron transport chain. Hence, reduced prohibitin 
activity promotes survival of animals with compromised mitochon- 
drial function. Energy metabolism in mitochondria is also affected by 
dietary restriction. We find that prohibitin deficiency improves the 
survival of dietary-restricted eat-2 mutants (Fig. 2e and Supplemen- 
tary Table 1). 

To gain insight into the mechanism underlying the distinctive 
effects of prohibitin on ageing, we performed a temporal analysis 
of ATP levels during ageing (at day 3, 10 and 15 of adulthood) in 
wild-type animals and in daf-2 and daf-7 diapause mutants. We 
found that ATP levels are higher in diapause mutants compared to 
age-matched wild-type control animals. Knockdown of prohibitin 
specifically increased the levels of ATP in dauer-defective daf-2 and 
daf-7 mutants, progressively with age. In contrast, we did not detect 
ATP elevation in prohibitin-depleted wild-type animals during age- 
ing (Fig. 3a; data for day 10 shown). The significant energy surplus in 
daf-2 mutants lacking prohibitin correlates with their exceptionally 
long lifespan (Fig. 1b). Our findings indicate that prohibitin moder- 
ates ATP levels under conditions of reduced diapause signalling. 

What is the molecular basis of the different impact of prohibitin on 
ATP levels between wild type and diapause mutant animals? 
Mitochondrial energy metabolism is linked to fat utilization in both 
nematodes and mammals. We visualized fat depositions in the intestine 
of wild-type animals and diapause mutants during ageing, using the 
vital dye Nile red (see Methods). Fat accumulates during ageing in wild- 
type animals and in two representative diapause mutants (daf-2 and 
daf-7; Fig. 3b and Supplementary Fig. 3a). These observations were 
confirmed by Sudan black staining of fixed animals (Supplementary 
Fig. 3b). Prohibitin deficiency markedly reduces intestinal fat content 
early in adulthood, in all genetic backgrounds (Fig. 3b and Supplemen- 
tary Fig. 3a, day 5). However, the effect of prohibitin depletion 
diminishes with age in wild-type animals, whereas it remains strong 
in both diapause mutants (Fig. 3b and Supplementary Fig. 3a, day 10 
and day 15). Thus, prohibitin differentially modulates animal fat con- 
tent in a genetic background- and age-specific manner. 

The nuclear hormone receptor NHR-49 is a key regulator of fat 
mobilization, modulating fat consumption and maintaining a normal 
balance of fatty acid saturation. Elimination of NHR-49 causes fat 
accumulation due to reduced expression of fatty acid B-oxidation 
enzymes such as the delta-9 stearoyl-CoA desaturase FAT-7, which 
is required for the synthesis of monounsaturated fatty acids’. 
Prohibitin deficiency extends the lifespan of both nhr-49 and fat-7 
mutants (Fig. 3c, d; Supplementary Table 1). In addition, knockdown 
of prohibitin reduces intestinal fat in nhr-49 and fat-7 mutant animals 
(Supplementary Fig. 4a, b). Taken together, our findings indicate that 
prohibitin deficiency engages fat metabolism to promote longevity. 

Prohibitin has been implicated in several human cancers and is 
generally overexpressed in transformed cells compared with their 
non-transformed counterparts’. We examined the requirement for 
prohibitin activity during tumour formation in C. elegans. Although 
C. elegans somatic cells are post-mitotic, germ cells are continually 
dividing during oogenesis. gid-1 is a tumour suppressor gene that 
encodes a protein containing a K homology RNA-binding domain 
that is required for meiotic cell cycle progression during oogenesis”. 
gld-1 mutant animals develop lethal germline tumours and are short 
lived because of ectopic germ cell overproliferation in the gonad”. 
Germ cells eventually leak out of the gonad into the body cavity or, 
through the vulva, to the outside. We find that prohibitin deficiency 
blocks tumour formation and extends lifespan in gld-1 mutants 
(compare Supplementary Fig. 5a with b; Supplementary Table 1). 
These observations indicate a critical function of prohibitin in actively 
proliferating cells*!>'*. 
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Figure 3 | Effects of prohibitin depletion on energy metabolism. a, RNAi 
knockdown of either phb-1 or phb-2 specifically increases ATP levels in 
dauer-defective daf-2 or daf-7 mutants during ageing (day 10 of adulthood). 
No effect is observed in wild-type (N2) animals (error bars denote standard 
deviation; P < 0.005, unpaired tf test; assays were carried out at 20 °C). 

b, Quantification of intestinal fluorescence after Nile red staining of wild- 
type animals and dauer-defective mutants, subjected to RNAi with either 
phb-1 or phb-2 at day 5, day 10 and day 15 of adulthood (error bars denote 
standard deviation; P < 0.005, unpaired t test; assays were carried out at 
20 °C; see Methods). ¢, d, Survival curves of short-lived nhr-49 mutants 
(c) and animals lacking the delta-9 desaturase FAT-7 (d), subjected to RNAi 
with either phb-1 or phb-2. 


To investigate the mechanism by which prohibitin influences 
mitochondrial activity to modulate longevity, we analysed cellular 
mitochondrial content during ageing. We found that prohibitin elimi- 
nation promotes adult-onset mitochondrial proliferation in intestinal 
fat-storing cells of wild-type animals whereas, strikingly, it reduces 
mitochondrial content in diapause mutants. Notably, these mutants 
contain less mitochondria compared to wild type, during late adult- 
hood (Fig. 4a and Supplementary Fig. 6a, b). Mitochondrial and fat 
content is also reduced upon knockdown of phb genes in mutants with 
compromised mitochondrial function, in dietary restricted animals, 
and in fat metabolism mutants (Fig. 4b and Supplementary Fig. 7a). In 
Drosophila and in the adipose tissue of mice, FOXO transcription 
factors inhibit mitochondrial proliferation'”'*. We observed a similar 
effect in C. elegans diapause mutants (Fig. 4a), where DAF-16/FOXO 
is derepressed’. In contrast, prohibitin depletion does not significantly 
alter fat or mitochondrial content in animals lacking DAF-16 (P> 0.1, 
unpaired ftest; Supplementary Fig. 8a). Hence, DAF-16 is required to 
mediate the effects of prohibitin deficiency on fat and mitochondrial 
content in wild-type animals. However, elimination of prohibitin 
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during adulthood does not affect the subcellular localization of either 
a wild-type or a constitutively nuclear DAF-16 reporter fusion 
(Supplementary Fig. 8b). In addition, constitutive nuclear localization 
of DAF-16 is not sufficient to extend the lifespan of prohibitin- 
depleted animals (Supplementary Table 1). Moreover, the ageing 
effects of knockout or overexpression of sirtuin SIR-2.1, a regulator 
of DAF-16, are independent of prohibitin (Supplementary Table 1). 

Together, our findings show a new mechanism that couples nutrient 
availability and diapause signals with energy metabolism during age- 
ing. We hypothesize that prohibitin normally functions to promote 
longevity by moderating fat utilization and energy production via the 
mitochondrial respiratory chain. Under conditions that favour dia- 
pause, such as limited nutrient availability, cells adapt by shifting 
towards fermentative metabolism’’. Under such conditions, when 
energy demands exceed the capacity of mitochondrial respiration, 
prohibitin deficiency is beneficial for survival. We tested this hypo- 
thesis by monitoring survival of animals lacking prohibitin at a higher 
temperature (25 °C), where metabolic activity is elevated and energy 
demand is higher. Notably, whereas knockdown of prohibitin shortens 
lifespan at 20 °C it extends lifespan at 25 °C (Supplementary Table 1). 
We also tested the requirement for prohibitin under acute thermal 
stress (35 °C). Prohibitin deficiency renders wild-type animals strongly 
thermotolerant and further enhances the thermotolerance of daf-2 
diapause mutants (Supplementary Fig. 9a, b). Thus, the metabolic 
state determines whether prohibitin will promote or compromise 
longevity. Increased thermotolerance is independent of DAF-16/ 
FOXO (Supplementary Fig. 8c), indicating that other pathways are 
involved in metabolic changes elicited by prohibitin deficiency. 

We also examined the effects of prohibitin depletion on animals 
under oxidative stress. We induced oxidative stress by using sodium 
azide (NaN;3), a potent and specific inhibitor of cytochrome c oxi- 
dase, a component of the mitochondrial electron transport chain 
complex IV. Prohibitin depletion enhances survival after treatment 
with sodium azide during adulthood in diapause mutants 
(Supplementary Fig. 9c). Similarly, prohibitin deficiency increases 
resistance of adult daf-2 mutant animals to the herbicide paraquat 
(N,N’-dimethyl-4,4’-bipyridinium dichloride), a generator of super- 
oxide anions (Supplementary Fig. 9d). By contrast, lack of prohibitin 
compromises paraquat resistance during adulthood in an otherwise 
wild-type genetic background and during L4 larval development in 
both wild-type and daf-2 mutant animals (Supplementary Fig. 9e, f; 
ref. 6). Therefore, elimination of prohibitin under conditions of 
reduced diapause signalling further increases oxidative stress resist- 
ance during adulthood. We conclude that lack of prohibitin 
diminishes mitochondrial proliferation during ageing, augments 
oxidative stress resistance and extends lifespan, specifically under 
conditions of reduced insulin/IGF and TGF- signalling. 

Mitochondria are the main sites of reactive oxygen species genera- 
tion within cells. We measured reactive oxygen species formation upon 
prohibitin depletion in both wild-type and daf-2 mutant adult animals 
under normal and oxidative stress conditions. Reactive oxygen species 
levels were slightly lower in daf-2 mutants compared to wild type 
(Supplementary Fig. 9g). Notably, although prohibitin deficiency 
increased reactive oxygen species formation in wild-type animals, it 
reduced reactive oxygen species levels in daf-2 mutant adults (Sup- 
plementary Fig. 9g). We also assessed mitochondrial membrane poten- 
tial and oxygen consumption in prohibitin-deficient, wild-type and 
daf-2 mutant animals. Knockdown of prohibitin slightly reduces mito- 
chondrial membrane potential, while selectively increasing oxygen 
consumption in daf-2 mutants (Supplementary Fig. 10a, b). Paradoxi- 
cally, reactive oxygen species formation has been shown to underlie 
oxidative stress resistance and lifespan extension under glucose restric- 
tion in C. elegans”. It has been suggested that stress resistance and 
longevity are due to induction of a hormetic response (mitohorme- 
sis)”°. We investigated whether increased reactive oxygen species 
formation augments stress resistance and extends lifespan in prohibi- 
tin-deficient animals under stress. We treated prohibitin-depleted 
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Figure 4 | Prohibitin depletion and intestinal fat-storing cell mitochondrial 
content. a, MitoTracker Deep Red 633 staining of intestinal mitochondria in 
wild-type animals and dauer-defective daf-2 or daf-7 mutants subjected to 
RNAi with either phb-1 or phb-2. Images were acquired under the same 

exposure, using a X40 objective lens, at day 3, day 5 and day 10 of adulthood 
(see Methods; the anterior part of the intestine is shown, the head is located 


animals experiencing either oxidative stress (mev-1 mutants) or thermal 
stress (wild-type nematodes grown at 25°C) with N-acetylcysteine, a 
compound that functions as a free-radical scavenger”. We found no 
effect on longevity conferred by prohibitin knockdown (Supplementary 
Table 1 and Supplementary Fig. 11). Thus mitohormesis is unlikely to 
mediate the effects of prohibitin elimination on ageing. 

The AMP-dependent kinase (AMPK) AAK-2 has been implicated 
in coupling energy levels and insulin/IGF-1 signals to modulate life- 
span in C. elegans*’*'. AMPK targets p53 to promote cell survival 
under conditions of nutrient deprivation”. We find that prohibitin 
deficiency shortens the lifespan of both aak-2 and p53 (cep-1) mutant 
animals (Supplementary Table 1 and Supplementary Fig. 12a, b). 
Interestingly, aak-2 mutants contain more mitochondria than wild- 
type animals. Depletion of prohibitins further increases mitochon- 
drial proliferation in this genetic background (Supplementary 
Fig. 12c, d). In addition, AAK-2 deficiency ameliorates fat content 
reduction upon prohibitin depletion (Supplementary Fig. 12e, f), 
indicating that AAK-2 is involved in mediating prohibitin effects 
on fat content. We also investigated the involvement of the mitogen- 
activated protein kinase (MAPK) JNK-1, which promotes DAF-16/ 
FOXO nuclear localization under conditions of stress”*, and the Akt/ 
PKB homologue AKT-1, which transduces insulin/IGF-1 signals”, in 
mediating the effects of prohibitin depletion on metabolism and age- 
ing. We found that phb gene knockdown marginally extends the life- 
span of animals overexpressing jnk-1, whereas it shortens lifespan in 
animals lacking JNK-1 (Supplementary Table 1 and Supplementary 
Fig. 13a). Overexpression of JNK-1 in wild-type animals reduces mito- 
chondrial content and suppresses mitochondrial proliferation upon 
prohibitin depletion. By contrast, mitochondrial content is higher in 
jnk-1 mutants and remains unchanged after prohibitin removal 
(Supplementary Fig. 13b, c). Similarly, prohibitin deficiency does 
not alter fat content in jnk-1 mutant animals (Supplementary 
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at the top; bar, 50 tm). b, MitoTracker Deep Red 633 staining of intestinal 
mitochondria in gas-1(fc21), isp-1(qm150), clk-1(e2519), eat-2(ad465) and 
nhr-49(gk405) mutants subjected to RNAi with either phb-1 or phb-2. 
Images were acquired under the same exposure, using a X40 objective lens at 
day 5 of adulthood. 


Fig. 13d, e). Thus, JNK-1 is required for mitochondrial proliferation 
and reduction of fat content in animals lacking prohibitin. 
Elimination of prohibitin does not shorten the lifespan of animals 
without AKT-1 (Supplementary Table 1 and Supplementary Fig. 
13a). Mitochondrial content is reduced in akt-1 mutants and does 
not increase upon prohibitin depletion, compared to wild type 
(Supplementary Fig. 13b, c). In contrast, fat content in akt-1 mutants 
is higher than in wild type and sharply diminishes in the absence of 
prohibitin, similarly to daf-2 mutant animals (Supplementary 
Fig. 13d, e). Taken together, our observations indicate that the JNK- 
1 kinase, in part, mediates the effects of prohibitin deficiency on fat 
metabolism and mitochondrial proliferation. This response is poten- 
tiated under conditions of low diapause signalling, where AKT-1 
activity is reduced and DAF-16/FOXO nuclear localization is not 
blocked. 

In mammalian cells and in C. elegans, loss of prohibitin disrupts 
the reticular mitochondrial network and leads to accumulation of 
fragmented mitochondria®”. Prohibitin maintains mitochondrial 
integrity and biogenesis by stabilizing the dynamin-like GTPase 
OPA1”, a core component of the mitochondrial fusion machinery, 
which is required for mitochondrial fusion and cristae maintenance, 
and has been implicated in the pathogenesis of inherited autosomal 
dominant optic atrophy”®. The eat-3 gene encodes the C. elegans 
homologue of OPA1. We examined whether prohibitin functions 
through OPA1 to regulate metabolism during ageing, by analysing 
the effects of EAT-3 depletion on fat metabolism, mitochondrial 
content, ATP levels and lifespan, in wild type and insulin/IGF1 
signalling-deficient animals. In contrast to prohibitin depletion, loss 
of EAT-3 extends the lifespan of otherwise wild-type animals, whereas 
it shortens the lifespan of long-lived daf-2 mutants (Supplementary 
Table 1 and Supplementary Fig. 14a, b). We did not observe significant 
alterations of fat metabolism, ATP levels or mitochondrial content in 
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EAT-3-deficient animals (Supplementary Fig. 14c—e). Our findings 
indicate that prohibitin functions independently of EAT-3/OPA1 to 
regulate metabolism during ageing. 

What is the origin of prohibitin effects on metabolism and ageing? 
We propose that under normal conditions the mitochondrial prohibi- 
tin complex promotes longevity by moderating fat metabolism, mito- 
chondrial proliferation and energy levels in C. elegans. In addition to 
maintaining normal mitochondrial metabolism, prohibitin functions 
as a negative regulator of mitochondrial proliferation in wild-type ani- 
mals (Supplementary Fig. 15). Elimination of prohibitin may activate a 
cellular retrograde response that induces mitochondrial overprolifera- 
tion. In turn, accumulation of defective mitochondria lacking prohibi- 
tin results in increased reactive oxygen species production, metabolic 
defects, consequent cellular damage and reduced lifespan. Interestingly, 
prohibitin depletion elicits reactive-oxygen-species-dependent, Akt 
hyperactivation in endothelial cells*’. Under low diapause signalling 
and stress conditions, where AKT-1-mediated inhibition of DAF-16/ 
FOXO nuclear localization is relieved, the JNK-1 and AAK-2 stress- 
related signalling pathways are activated by prohibitin depletion, 
adjusting cellular metabolism towards fat utilization and promoting 
longevity. 

Our study reveals an important role of prohibitin in regulating fat 
metabolism and mitochondrial proliferation during ageing. The 
opposing effects of prohibitin on longevity indicate that specific 
cellular mechanisms may differentially regulate ageing, depending 
on extrinsic or intrinsic cues such as diapause signalling or energy 
demands. The tight evolutionary conservation and ubiquitous 
expression of prohibitin proteins indicate a similar role during ageing 
in other organisms. 


METHODS SUMMARY 

Lifespan analysis. Lifespan assays were performed at 20 °C unless noted otherwise. 
Synchronous animal populations were generated by hypochlorite treatment of 
gravid adults to obtain tightly synchronized embryos that were allowed to develop 
into adulthood under appropriate, defined conditions. Animals were transferred 
to fresh plates in groups of 10-20 worms per plate for a total of 100-150 individuals 
per experiment. The day of egg harvest was used as t = 0. Animals were transferred 
to fresh plates every 2-4 days thereafter and were examined every day for touch- 
provoked movement and pharyngeal pumping, until death. Survival curves were 
generated using the product-limit method of Kaplan and Meier. The log-rank 
(Mantel—Cox) test was used to evaluate differences in survival and determine P 
values. 

MitoTracker staining. Animals were stained overnight on plates containing 
MitoTracker Deep Red 633 at a final concentration of 100nM. Animals were 
mounted on 2% agarose pads in M9 buffer containing 10 mM sodium azide and 
scanned at room temperature with a 637nm laser beam, under a confocal 
microscope. 


Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 
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METHODS 

Strains and genetics. We followed standard procedures for C. elegans strain 
maintenance. Nematode rearing temperature was kept at 20°C, unless noted 
otherwise. The following strains were used in this study: N2, wild-type Bristol 
isolate; CB1370, daf-2(e1370)III, CB1372, daf-7(e1372)III, CB4876: clk- 
1(e2519) III, CF1139: daf-16(mu86)I;muls61, CF1308: daf-16(mu86) ;muEx116, 
CF1371: daf-16(mu86) ;muEx151, W152: gas-1(fc21)X, DA465: eat-2(ad465) II, 
DR26: daf-16(m26)I, DR47: daf-11(m47) V, DR63: daf-4(m63) III, DR1309: daf- 
16(m26) I,daf-2(e1370) II, GR1307: daf-16(mgDf50)I, HT941: IpIn1, JK1466: gld- 
1(q485)/dpy-5( e61) unc-13(e51) I, LG100: geIn3; MQ887: isp-1(qm150)IV, RB754: 
aak-2(0k524)X, RB759: akt-1(0k525) V, TK22: mev-1(kn1) II, VC8: jnk-1(gk7)IV, 
VC199: sir-2.1(0k434)IV, VC870: nhr-49( gk405) Lfat-7(tm0326), XY1054: cep- 
1(1g12501)L, N2Ex[Ppnp-1PHB-1::GFP pRF4], N2Ex[ppny-2PHB-2::GFP pRF4] 
and phb-1(tm2571)I;sDp2(Lf). The sDp2(Lf) balancer chromosomal duplication 
is unstable during meiosis and is lost in about 30% of the progeny, which become 
sterile homozygous phb-1(tm2571) adults. 

Molecular cloning. For engineering phb-1 and phb-2 dsRNA-producing 
Escherichia coli bacteria, the corresponding genomic DNA fragments, previously 
inserted into the pBluescript II plasmid vector®, were excised by SacI/KpnI and 
KpnI/Spel, respectively, and sub-cloned into the pL4440 RNAi vector. The 
resulting plasmid construct was used to transform HT115(DE3) E. coli bacteria, 
deficient for RNase-E. Bacteria carrying an empty vector were used in control 
experiments. To generate ppj,»-;PHB-1::GFP and ppjp.2?HB-2::GFP full-length 
GFP reporter fusions, DNA fragments derived from the phb-1 and phb-2 loci 
were PCR-amplified using appropriate oligonucleotide primers and fused to 
GFP. For ppjy-;PHB-1::GFP, the primers 5’-AACTGCAGCTCAACGCGTGAG 
CCATACC-3’ and 5'-GCTCTAGAGGATTGAAGGTTGAGAAGG-3’ were used 
to amplify a 1.7kb DNA fragment encompassing the promoter plus the full 
coding region of phb-1, which was digested with PstI/Xbal and inserted into 
plasmid vector pPD95.77. Similarly, for ppj,»-2PHB-2::GFP, the primers 5’-AC 
ATGCATGCGAGTCAGAGATAAAGACCG-3' and 5’-GCTCTAGAGCGTCTT 
TTGTCGGTCAC-3’ were used to amplify a 1.9 kb DNA fragment encompassing 
the promoter plus the full coding region of phb-2, which was digested with SphI/ 
Xbal and inserted into pPD95.77. Reporter constructs were injected into the 
gonads of wild-type animals together with pRF4, a plasmid that carries the 
rol-6(su1006) dominant transformation marker. Two independent, transgenic 
lines were obtained for the ppny-;PHB-1::GFP plasmid construct and roller her- 
maphrodites were examined for reporter fusion expression. For ppny-2PHB- 
2::GFP, we obtained and examined numerous (>50) FI transgenic progeny, 
none of which propagated to generate a stable transgenic line, probably because 
the GFP moiety of the fusion interferes with the formation of the complex”. 
Transgenic animals were mounted on a 2% agarose pad in M9 buffer, containing 
10mM sodium azide and scanned at room temperature with a 488 nm laser 
beam, under a confocal microscope (Zeiss AxioScope with a Bio-Rad 
Radiance 2100 scanhead). Images were acquired using a 515 + 15 nm band-pass 
filter and a X40 Plan-NEOFLUAR objective (numerical aperture 0.75). 
Lifespan analysis. Lifespan assays were performed at 20 °C unless noted otherwise. 
Synchronous animal populations were generated by hypochlorite treatment of 
gravid adults to obtain tightly synchronized embryos that were allowed to develop 
into adulthood under appropriate, defined conditions. For RNAi lifespan experi- 
ments worms were placed on NGM plates containing 1-2 mM IPTG and seeded 
with HT115(DE3) bacteria transformed with either the pL4440 vector or the test 
RNAi construct. Progeny were grown at 20 °C unless noted otherwise, through the 
L4 larval stage and then transferred to fresh plates in groups of 10-20 worms per 
plate for a total of 100-150 individuals per experiment. The day of egg harvest and 
initiation of RNAi was used as t = 0. Animals were transferred to fresh plates every 
2-4 days thereafter and were examined every day for touch-provoked movement 
and pharyngeal pumping, until death. Worms that died due to internally hatched 
eggs, an extruded gonad or desiccation due to crawling on the edge of the plates, 
were censored and incorporated as such into the data set. Each survival assay was 
repeated at least three times and figures represent typical assays. Survival curves 
were created using the product-limit method of Kaplan and Meier. The log-rank 
(Mantel—Cox) test was used to evaluate differences between survivals and deter- 
mine P values. We used the Prism software package (GraphPad Software) to carry 
out statistical analysis and to determine lifespan values. 

Stress resistance assays. To evaluate thermotolerance, four-day-old adult her- 
maphrodites were placed on pre-warmed (35 °C) NGM plates and incubated at 
35°C. At the indicated times, plates were removed and worms were scored for 
motility, provoked movement and pharyngeal pumping. Worms failing to display 
any of these traits were scored as dead. Three distinct populations of 30 adults 
were scored repeatedly over the assay period. Statistical tests were performed 
using the Kaplan—Meier survival analysis, as described above for lifespan data. 
To analyse oxidative stress resistance, 7-day-old adults were exposed to 1 mM 
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sodium azide (Sigma-Aldrich) for 18h, on RNAi plates, at 20 °C. Animals were 
scored for survival after a 3-h recovery period. To assay paraquat resistance, 
7-day-old adults were exposed to 40 mM paraquat (Aldrich) on RNAi plates at 
20 °C and survival was scored from day 8 of adulthood. For paraquat resistance of 
L4 larvae, animals were exposed to 2 mM paraquat on RNAi plates at 20 °C and 
survival was scored every 2 days. The percentage of surviving animals for each 
drug treatment was calculated in three independent experiments. In each experi- 
ment, 100 animals were analysed. Statistical analysis of data was performed using 
the Excel software package (Microsoft). 

Fat staining. Nile red powder (catalogue number N3013, Sigma-Aldrich) was 
dissolved in DMSO at 5 mg ml |, diluted in M9 and added on top of nematode 
growth media (NGM) plates seeded with HT115(DE3) E. coli bacteria harbour- 
ing the appropriate RNAi plasmids, to a final concentration of 0.02 1g ml. 
Synchronous embryos were allowed to develop into adulthood and grow con- 
tinuously on Nile red-containing plates. The extent of fat staining was assessed at 
specific time points by epifluorescence microscopy”. Animals were observed 
using a X20 Plan-NEOFLUAR objective (numerical aperture 0.50), coupled 
with a 546 + 12 nm band-pass excitation and a 590 nm long-pass emission filter, 
on a Zeiss AxioPlan microscope (Carl Zeiss). Images were acquired using a Zeiss 
AxioCam digital colour camera. Emission intensity was measured on greyscale 
images with a pixel depth of 8 bit (256 shades of grey). Average pixel intensity 
values were calculated by sampling three images of different animals, three times 
each (nine measurements total for each strain/condition). We calculated the 
mean and maximum pixel intensity for each animal in these images using the 
ImageJ software (http://rsb.info.nih.gov/ij/). For each experiment, at least 50 
images were processed over at least five independent trials. Because recent 
studies suggested that Nile red may not accurately indicate fat content in 
insulin/IGF-1 mutants, owing to uptake and/or anatomical issues*’, animals 
were also stained with Sudan black (Sigma-Aldrich) as described previously’. 
Briefly, non-starved animals were collected in M9 buffer and washed three times. 
Animals were then fixed by adding 10% paraformaldehyde solution to final 
concentration of 1%. Fixed animals were frozen at —80 °C and underwent three 
cycles of freeze—thawing before washing in cold M9 buffer three times. Animals 
were subsequently dehydrated by ethanol washes (serially in 25%, 50% and 70% 
ethanol). For staining, three volumes of saturated Sudan black B solution (in 
70% ethanol) were added to worms. Animals were incubated overnight and 
washed thrice with 70% ethanol before observation. Animals were observed 
using a X20 Plan-NEOFLUAR objective (numerical aperture 0.50) on a Zeiss 
AxioPlan microscope (Carl Zeiss). Images were acquired using a Zeiss AxioCam 
digital camera. Emission intensity was measured on greyscale images with a pixel 
depth of 8 bit (256 shades of grey). Average pixel intensity values were calculated 
by sampling three images of different animals, three times each (nine measure- 
ments total for each strain). We calculated the mean and maximum pixel intensity 
for each animal in these images using the ImageJ software (http://rsb.info.nih. 
gov/ij/). Numbers obtained were subtracted from 255 to obtain the values 
depicted in Supplementary Fig. 3. For each experiment, at least 20 images were 
processed over at least three independent trials. 

MitoTracker staining. Animals were stained overnight on RNAi plates contain- 
ing MitoTracker Deep Red 633 (catalogue number M-22426; Molecular Probes, 
Invitrogen) at a final concentration of 100 nM. Animals were mounted in a 2% 
agarose pad in M9 buffer containing 10 mM sodium azide and scanned at room 
temperature with a 637nm laser beam, under a confocal microscope (Zeiss 
AxioPlan coupled to a Bio-Rad Radiance 2000 laser scanning system). Images 
of emission were acquired using a 660 nm long-pass filter and a X40 Plan- 
NEOFLUAR objective (NA 0.75), and processed with Bio-Rad LaserSharp 
2000 software. 

ATP measurements. To determine ATP content, 50 age-matched animals were 
collected in 50 ul of S Basal buffer and frozen at —80 °C. Nematodes were col- 
lected at the L4 stage of development and on day 2, day 10 and day 15 of 
adulthood. Frozen worms were immersed in boiling water for 15 min, cooled 
and centrifuged to pellet insoluble debris. The supernatant was moved to a fresh 
tube and diluted tenfold before measurement. ATP content was determined by 
using the Roche ATP bioluminescent assay kit HSII (Roche Applied Science) and 
a TD-20/20 luminometer (Turner Designs). ATP levels were normalized to total 
protein content. 

Quantification of reactive oxygen species production. Reactive oxygen species 
formation was quantified as described’’*'. Briefly, we use the membrane- 
permeable non-fluorescent dye 2,7-dichlorodihydrofluorescein-diacetate (H2- 
DCF-DA) (Sigma-Aldrich). H2-DCF-DA is deacetylated and becomes membrane 
impermeable after entering the cell. H2-DCF fluoresces upon oxidation to 2,7- 
dichlorofluorescein (DCF) by reactive oxygen species. Young adults treated as 
described above were washed off of the plates with M9 buffer. After washing to 
reduce bacterial content, a 50 jl volume of worm suspension was pipetted in four 
replicates into the wells of a 96-well plate with opaque walls and bottom and 
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allowed to equilibrate to room temperature. A fresh 100 1M H2-DCF-DA solu- 
tion (50 pl) was pipetted to the suspensions, resulting in a final concentration of 
50 uM. Basal fluorescence was measured after addition of H2-DCF-DA, in a 
microplate reader at excitation/emission wavelengths of 485 and 520 nm. Plates 
were kept for 1 h shaking at 20 °C. Then, a second measurement was performed. 
The initial fluorescence and the fluorescence signals of control wells were 
subtracted from the second measurement. Values were normalized to protein 
content. 

Electrophoresis and western blot analysis. For one-dimensional SDS-PAGE, 
worm pellets were re-suspended in five volumes of SDS-sample buffer, boiled for 
5 min, and the proteins were resolved on 15% gels. Following electrophoresis, 
proteins were blotted to PVDF membranes, and immunoreactive material 
was visualized by chemiluminescent detection (ECL; Amersham) according to 
the manufacturer’s instructions. A polyclonal antibody raised against the 25 
carboxy-terminal amino acids of the murine PHB-1 protein has been described 
previously’’. Polyclonal antibody against the yeast B-subunit of Fl1-ATPase was a 
gift from J. Berden. Anti-actin antibody was obtained from ICN (clone C4) and 
used at a dilution of 1:10,000. 

Mitochondrial DNA quantification. Mitochondrial DNA (mtDNA) was quan- 
tified using quantitative real time PCR as described previously’. We used the 
primers 5’-GTTTATGCTGCTGTAGCGTG-3' and 5’-CTGTTAAAGCAAGTG 
GACGAG-3' (Mitol set) for mtDNA. The results were normalized to genomic 
DNA using the following primers specific for ama-1: 5'-TGGAACTCTGGA 
GTCACACC-3' and 5’-CATCCTCCTTCATTGAACGG-3’. Quantitative PCR 
was performed using the Bio-Rad CFX96 Real-Time PCR system, and was 
repeated three times. 

Oxygen consumption rate measurements. Oxygen consumption rates were 
measured as previously described*’ using a Clark-type electrode with some 
minor modifications (Hansatech Instruments). Young adult worms were 
washed and collected in S-basal buffer. Approximately 100 tl of slurry pellet of 
worms were delivered into the chamber in 3 ml of S-basal medium. The chamber 
was kept at 25 °C, and measurements were done for 5-15 min, depending on the 
oxygen consumption rate. The slope of the straight portion of the plot was used 
to derive the oxygen consumption rate. Worms were recovered after respiration 
measurements and collected for protein quantification. Rates were normalized 
to protein content. We performed three independent measurements per strain. 
Statistic analysis was performed using the Excel software package (Microsoft). 
Membrane potential measurements. Mitochondrial membrane potential 
was measured in vivo using the fluorescent, lipophilic carbocyanine dye, 3,3’- 
dipropylthiacarbocyanine iodide (diS-C3(3); Sigma-Aldrich), as described™. 
Stained and washed worms were immobilized with Levamisole before mounting 
on 2% agarose pads for microscopic examination with a Zeiss AxioPlan micro- 
scope (Carl Zeiss) equipped with a Zeiss AxioCam digital colour camera. Images 
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were acquired under the same exposure. Average pixel intensity values were 
calculated by sampling three images of different animals, three times each (nine 
measurements total for each strain/condition). We calculated the mean and 
maximum pixel intensity for each animal in these images using the Image] 
software (http://rsb.info.nih.gov/ij/). For each experiment, at least 50 images 
were processed over at least five independent trials. 

Prohibitin overexpression. phb-1 and phb-2 overexpression plasmids were con- 
structed by PCR amplification of the phb-1 and phb-2 loci, using primers 
5'-CTCAACGCGTGAGCCATACC-3’ and 5'-CGACATCGGGGAATTGATTC-3’ 
for phb-1, and primers 5'-CGAGTCAGAGATAAAGACCG-3' and 5'-AACCG 
GGAATTACATTCCAG-3' for phb-2. The resulting 2.1 kb and 2.5 kb fragments 
for phb-1 and phb-2, respectively, were inserted into the plasmid vector pCRII- 
TOPO (Invitrogen). The two constructs, either each alone or both, were injected 
into the gonads of wild-type animals, together with pPD118.33, a plasmid that 
Carries a Py-2GFP reporter fusion as transformation marker (pharyngeal muscle 
GFP expression). pRF4, a plasmid that carries the rol-6(su 1006) dominant trans- 
formation marker was also used. We have not been able to establish stable trans- 
genic lines overexpressing either each or both prohibitin genes. Although F, 
transgenic animals expressing the co-injection marker were obtained, none of 
the F, transgenic progeny segregated phb-overexpressing F, transgenic animals, 
indicating that overexpression is causing lethality. Growing F, transgenic animals 
on phb RNAi plates (and thus quenching phb gene overexpression) allowed 
generation of F, transgenics. These animals stopped propagating once shifted 
onto regular OP50 plates. 
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Genetic variation in /L28B and spontaneous clearance 
of hepatitis C virus 


David L. Thomas’*, Chloe L. Thio’*, Maureen P. Martin?*, Ying Qi’, Dongliang Ge’, Colm O'hUigin’, Judith Kidd*, 
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Hepatitis C virus (HCV) infection is the most common blood- 
borne infection in the United States, with estimates of 4 million 
HCV-infected individuals in the United States and 170 million 
worldwide’. Most (70-80%) HCV infections persist and about 
30% of individuals with persistent infection develop chronic liver 
disease, including cirrhosis and hepatocellular carcinoma’. 
Epidemiological, viral and host factors have been associated with 
the differences in HCV clearance or persistence, and studies have 
demonstrated that a strong host immune response against HCV 
favours viral clearance**. Thus, variation in genes involved in the 
immune response may contribute to the ability to clear the virus. 
In a recent genome-wide association study, a single nucleotide 
polymorphism (rs12979860) 3 kilobases upstream of the IL28B 
gene, which encodes the type III interferon IFN-23, was shown 
to associate strongly with more than a twofold difference in 
response to HCV drug treatment’. To determine the potential 
effect of rs12979860 variation on outcome to HCV infection in a 
natural history setting, we genotyped this variant in HCV cohorts 
comprised of individuals who spontaneously cleared the virus 
(n= 388) or had persistent infection (n= 620). We show that 
the C/C genotype strongly enhances resolution of HCV infection 
among individuals of both European and African ancestry. To our 


Table 1| Characteristics of study subjects 


knowledge, this is the strongest and most significant genetic effect 
associated with natural clearance of HCV, and these results im- 
plicate a primary role for IL28B in resolution of HCV infection. 

Approximately 30% of individuals spontaneously clear acute HCV 
infection. Host genetic variation is assumed to explain the hetero- 
geneity in HCV clearance across individuals because such differences 
occur even after exposure to the same HCV inoculum and because 
there are ethnic differences in clearance frequency®’. Variation in 
genes involved in the immune response has already been linked to 
outcome of acute HCV infection*’, presumably owing to alteration 
in the strength and quality of the immune response. However, most 
variability in spontaneous HCV clearance remains unexplained. 

A recent genome-wide association study of >1,600 individuals 
chronically infected with hepatitis C participating in a clinical treat- 
ment trial with pegylated interferon (IFN)-o and ribavirin identified a 
single nucleotide polymorphism (SNP) on chromosome 19q13, 
rs12979860, that was strongly associated with sustained virological 
response (SVR)*°. This SNP maps 3kilobases (kb) upstream of 
the IL28B gene, which encodes the type III interferon IFN-A3. The 
C/C genotype was associated with a 2.5 or greater rate (depending 
on ethnicity) of SVR compared with the T/T genotype, and the C 
allele was over-represented in a random multi-ethnic population as 


Characteristic Clearance Persistence 
(n = 388) (n = 620) 

Mean age (years)* 33.9 32.0 

Male (%) 78.6 (305) 80.2 (497) 

European ancestry (%) 67.3 (261) 61.5 (381) 

African ancestry (%) 25 (97) 31.1 (193) 

Other (%) 7.7 (30) 7.4 (46) 

HBsAg status (% positive)t 9.7 (30) 3.0 (18) 

HIV status (% positive) 19.3 (75) 24.4 (151) 

rs12979860 allele frequency (%) 

Cc European ancestry = 80.3; African ancestry = 56.2 European ancestry = 66.7; 
African ancestry = 37 

T European ancestry = 19.7; African ancestry = 43.8 European ancestry = 33.3; 
African ancestry = 63 


A total of 68.7% of study subjects was derived from cohorts that were matched on HIV status, gender and ethnicity. Numbers (n) are given in parentheses. 


* There was one individual in the clearance group for which age was not available. 


+ Information for HBsAg (hepatitis B surface antigen) status was unavailable for 80 individuals in the clearance group and 26 in the persistence group. 
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Cancer Institute, Rockville, Maryland 20852, USA. ®Department of Epidemiology, Johns Hopkins University Bloomberg School of Public Health, Baltimore, Maryland 21205, USA. ?Rho, 
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Table 2 | Effect of IL28B rs12979860 genotype on clearance of HCV 
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Genotype Frequency of clearance (%) Frequency of persistence (%) Comparison OR (95% Cl) P-value 

All subjects* 

T/T 23.4 (37) 76.6 (121) C/C versus T/T 0.29 (0.18-0.47) 4x1077 
C/T 29.5 (124) 70.5 (297) C/C versus C/T 0.35 (0.25-0.48) 4x10 
C/T+T/T 28 (161) 72 (418) C/C versus C/T +T/T 0.33 (0.25-0.45) 3x10 
Cc/c 53 (227) 47 (202) = = = 
Subjects of European 

ancestryt 

W/T 31.4 (16) 68.6 (35) C/C versus T/T 0.50 (0.25-0.98) 0.04 

C/T 27.8 (71) 72.2 (184) C/C versus C/T 0.36 (0.24-0.52) 1x10’ 
C/T+T/T 28.4 (87) 71.6 (219) C/C versus C/T +T/T 0.38 (0.26-0.54) 110° 
C/C 51.8 (174) 48.2 (162) = - = 
Subjects of African 

ancestryt 

T/T 20.8 (20) 79.2 (76) C/C versus T/T 0.21 (0.10-0.44) 3x10° 
C/T 33 (45) 67 (91) C/C versus C/T 0.40 (0.21-0.75) 0.005 
C/T+T/T 28 (65) 72 (167) C/C versus C/T +T/T 0.32 (0.17-0.57) 1x104 
cfc 55.2 (32) 44.8 (26) = = zi 

OR, odds ratio; Cl, confidence interval. Numbers (n) are indicated in parentheses for columns two and three. 


* OR and P-values for all subjects were adjusted by cohort and ethnicity. 
+ OR and P-values for subjects of European and African ancestry were adjusted by cohort. 


compared with the chronically infected study cohort, raising the 
possibility that the C allele may favour spontaneous clearance of HCV. 

To address directly the role of the rs12979860 SNP in HCV clear- 
ance, we genotyped 1,008 individuals from 6 independent HCV 
cohorts composed of individuals who cleared virus (n = 388) and 
individuals with persistent infection (n = 620). Genotypes were in 
Hardy—Weinberg equilibrium in both individuals of African and 
European ancestry (P = 0.47 and 0.77, respectively). The frequency 
of the C allele was significantly greater among individuals of 
European ancestry than those of African ancestry in both the clear- 
ance (P=3X10 '°) and persistence groups (P= 1X10 7!) 
(Table 1). In both ethnic groups, however, there were significant 
differences in allele frequencies (C versus T) between the clearance 
and persistence groups, where the C allele showed greater frequencies 
in the clearance group than in the persistence group (80.3% versus 
66.7% respectively in individuals of European ancestry, 
P=7%X10 *; 56.2% versus 37% respectively in individuals of 
African ancestry, P= 1X 107°). 

More striking differences were observed in an analysis of genotype 
frequencies where patients with the C/C genotype were three times 
more likely to clear HCV relative to patients with the C/T and T/T 
genotypes combined (odds ratio (OR) = 0.33, P< 107 ”? for combined 
ethnic groups; Table 2 and Fig. 1). Stratification of this analysis by 
ethnicity indicated that the strength of the protective C/C effect was 
similar in individuals of African and European ancestry (OR = 0.32, 
P=1X10 *and OR =0.38, P=1X10 ’, respectively). However, a 
comparison of the C/C to the T/T group alone suggested stronger 
protection conferred by C/C in individuals of African ancestry 
(OR = 0.21, P=3 X10 °) relative to that in individuals of European 
ancestry (OR = 0.50, P= 0.04), although our power to detect a true 
difference is limited owing to small sample sizes in some groups. 


mes All patients European ancestry African ancestry 
P=4x10-7 P=0.04 P=3x10° 
so | l I l I l 
S P=4x10-11 P=1x107 P=0.005 
‘o 607 Fal [= = | 
2 
g 
oO 404 
oO 
7 [ [ | | 
T/T C/T C/C T/T C/T C/C T/T C/T C/C 
Genotypes 


Figure 1| Percentage of HCV clearance by rs12979860 genotype. Data are 
shown for all patients, as well as individuals of European ancestry and 
African ancestry separately. 


Table 3 | rs12979860 C allele frequency in worldwide populations 


No. Population Region n C allele frequency (%) 
1 Biaka Pygmies Africa 66 23.5 
2 buti Pygmies Africa* 39 23:1. 
3. Chagga Africa* 44 37.5 
4 — Ethiopian Jews Africa* 21 54.8 
5 asai Africa* 20 40.0 
6  Sandawe Africa* 37 44.6 
7 ~ Zaramo Africa 39 37.2 
8 Hausa Africa* 38 31.6 
9 bo Africa* 47 38.3 
0 Yoruba Africa* 77 31.2 
1 Danish Europe* 51 76.5 
2 Finns Europe* 33 65.2 
3 Hungarians Europe* 142 65.1 
4 Irish Europe* 113 13.9 
5 Russians, Vologda Europe* 48 61.4 
6 Russians Europe* 32 64.1 
7 Adygei Europe* 53 52.8 
8 Chuvash Europe* 40 737 
9 Khanty Europe 49 85.7 
20 Komi Europe* 47 70.2 
21 Roman Jews Europe* 27 79.6 
22  Sardinians Europe 34 52.9 
23 European-American Europe 92 67.4 
24 Druze Southwest Asia 96 776 
25 Kuwaitis Southwest Asia 16 75.0 
26 Yemenite Jews Southwest Asia 41 69.5 
27 Indians South Asia* 29 65.5 
28 Kachari South Asia* 7 94.1 
29 Thoti South Asia* 4 89.3 
30 Cambodians Southeast Asia* 24 97.9 
31  Laotians Southeast Asia* 18 93.6 
32 Chinese, Taiwan East Asia* 47 93.6 
33 Chinese, San Francisco East Asia* 59 97.5 
34 Hakka East Asia* 40 95.0 
35 Ami East Asia 40 98.8 
36 Atayal East Asia 40 100.0 
37 Japanese East Asia* 50 91.0 
38 Koreans East Asia* 54 93.5 
39 Yakut East Asia* 50 90.0 
40 icronesians Oceania* 36 98.6 
41 Nasioi Oceania* 23 100.0 
42 Samoans Oceania* 8 100.0 
43 Papua New Guineans Oceania 22 70.4 
44 Pima, Mexico North America 99 55.5 
45 ayans North America 52 37.5 
46 uscogees North America 10 65.0 
47 Ticuna South America 62 20.2 
48 Karitiana South America 54 82.4 
49 Surui South America 47 WWF 
50 Guihiba speakers South America 12 62.5 
51 Quechua South America 22 63.6 


* Samples used in Fs; estimation. 
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Overall, the protective effect of C seems to be primarily recessive, as no 
significant difference was observed between the C/T and T/T genotypes 
in individuals of African ancestry, European ancestry, or combined 
ethnic groups for clearance of HCV (data not shown), and C/C was 
consistently protective relative to C/T and/or T/T (Table 2 and Fig. 1). 
These results mirror the protective effect of the C/C genotype on SVR 
after HCV treatment observed previously’, where the protection con- 
ferred by the C allele also seemed to be recessive in both their Caucasian 
and African-American patients. 

Some individuals used in this study of HCV were co-infected with 
hepatitis B virus (HBV) and/or human immunodeficiency virus 
(HIV). To eliminate the possibility that co-infection with these 
viruses might confound the effect of rs12979860 on HCV outcome, 
analyses were performed using a multivariate model that included 
hepatitis B surface antigen status as a co-variate or stratifying by HIV 
status. Neither of these two chronic viral infections altered the effect 
of this locus on outcome of an acute HCV infection (Supplementary 
Tables 1 and 2). We also tested whether there were any differences in 
the effect of the protective rs12979860 C allele as a function of the 
route of HCV acquisition (plasma products versus injection drug 
use), but found no significant differences between the two groups 
(data not shown). Finally, adjusting by other host genetic factors that 
associate with clearance of HCV did not alter the protection con- 
ferred by the C/C genotype (Supplementary Information I). 

Patients with lower baseline HCV viral load respond more favourably 
to interferon-o treatment’’'*. However, little is known regarding the 
impact of viral load during acute infection on spontaneous HCV 
clearance because very few HCV-infected individuals are identified 
and studied at this early phase’. We reasoned that the mechanism of 
protection of the C/C genotype might also extend to greater control of 
viral load in the chronic phase, but there was no correlation between 
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rs12979860 genotype and viral load (Supplementary Information II and 
Supplementary Fig. 1). However, differences in viral load assays used in 
the various cohorts may mask a small correlation, even after conversion 
to international units. 

The frequency of HCV clearance varies markedly across ethnic 
groups’, and differences in allele frequencies for the rs12979860 
SNP were observed in the present study (Table 1) and in another 
study”. Indeed, the observation that the C allele is less frequent among 
individuals of African descent relative to those of European descent 
might explain, in part, the observed discrepancy in the frequency of 
viral clearance in these two ethnic groups, where clearance occurs in 
36.4% of HCV infections in individuals of non-African ancestry, but 
only in 9.3% of infections in individuals of African ancestry’. To gain 
a greater insight into the geographic frequency distribution of the 
protective C allele, we genotyped 2,371 individuals from 51 popula- 
tions worldwide (Table 3 and Fig. 2a). 

The global pattern of allele frequencies shows a striking pattern in 
which the allele leading to greater natural HCV clearance is nearly 
fixed throughout east Asia, has an intermediate frequency in Europe, 
and is the minor allele in Africa (Fig. 2a). A comparison of the 
rs12979860 allele frequency diversity across 32 world populations 
(as measured by Fy) with that for 1,062 SNPs typed in these same 
samples shows that the rs12979860 polymorphism has a greater 
differential frequency (Fsr = 0.23) than most of the other poly- 
morphisms (mean Fs; = 0.12, standard deviation = 0.1), falling 
within the upper 12.5 percentile of the distribution of Fsr values 
(Fig. 2b). Notably, the high frequencies of the C allele found in north 
and eastern Asian populations are not reflected in correspondingly 
high frequencies in their American relatives. Thus, if this locus has 
been under selection pressure, changes in the selective force that may 
be dependent on geographical location probably occurred after the 


Figure 2 | Sampling locations, 
allele frequencies and degree of 
regional differentiation of the 
rs12979860 C allele. a, The 
numbers identifying populations 
are given in Table 3. The pie charts 
show the frequency of the C (green) 
and T (blue) alleles in each 
population sampled. b, Frequency 
distribution of Fs values for 1,062 
SNPs from 32 of the samples 
grouped into 6 regions (Africa, 
Europe, south Asia, southeast Asia, 
east Asia, Oceania). The red arrow 
indicates the position of the 
estimated Fey for rs12979860. 
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colonization of the New World. That a common variant has such a 
strong impact on hepatitis C may indicate that it has actually been 
under selection, adding to an emerging interpretation of genome- 
wide association studies whereby common variants rarely have large 
effects unless they were selected to do so™. 

The rs12979860 SNP is only 3kb upstream of the IL28B gene, 
which encodes the type IIH interferon IFN-A3, and this SNP is in 
strong linkage disequilibrium (1° > 0.85) with a non-synonymous 
coding variant in the IL28B gene (213A>G, K70R; 1s8103142)°. 
Thus, it is possible that this 213A>G change alters the function of 
IFN-A3 and explains the genetic data described herein, but functional 
data will be essential to define the precise biological mechanism. Type 
Ill interferons include three members—IFN-A1, IFN-A2 and IFN- 
13—and the genes encoding these molecules are clustered on human 
chromosome 19q13 (refs 15, 16). They are structurally related to the 
IL-10 superfamily of cytokines, but share functional characteristics 
with the type I interferons (IFN-«% and IFN-f) in that they are 
induced by viral infections, signal through the JAK-STAT pathway, 
and exhibit antiviral activity in vitro'*'®. IFN-A1 has been shown to 
exhibit dose- and time-dependent HCV inhibition, induce increases 
in levels of interferon-stimulated genes, and enhance the antiviral 
efficacy of IFN-«'”. It is possible that IFN-A3 works through a similar 
mechanism. In vitro, IFN-)3 is at least as potent as IFN-A1 in terms of 
protecting HepG2 cells from lysis after infection with encephalomyo- 
carditis virus'®. Severe side effects in HCV treatment have been 
observed with IFN-« therapy’, whereas type III IFN (IFN-A) treat- 
ment may exhibit less ‘interferon-like’ adverse effects because recep- 
tors for the three family members are only expressed on a limited 
number of cell types”®*. Whether IFN-A may serve as an alternative 
treatment modality for HCV infection is under investigation. 

We have shown that the rs12979860 polymorphism upstream of 
IL28B which was previously associated with HCV treatment response 
also has a marked impact on natural clearance of HCV and may have 
been under selection in human history. It is now a priority to deter- 
mine the mechanisms through which IL28B promotes viral defence 
and the full range of viruses affected by these mechanisms. 


METHODS SUMMARY 


Subjects in this study were participants in one of six studies: (1) AIDS Link to 
Intravenous experience (ALIVE”; m= 281); (2) Multicentre Hemophilia Cohort 
Study (MHCS”; n= 305); (3) Hemophilia Growth and Development Study 
(HGDS*; n= 106); (4) Correlates of Resolved Versus Low Level Viremic 
Hepatitis C Infection in Blood Donors study (REVELL; n= 85); (5) an HCV 
clinic cohort in Portland, Oregon, USA (n= 51); and (6) a cohort of injection 
drug users from the UK (n = 180) (see Methods for details). Fifty-one worldwide 
populations (n= 2,371) from the ALlele FREquency Database (ALFRED)™ were 
also genotyped in this study. Details of sampling and ethnographic information 
for these populations can be found at http://alfred.med.yale.edu/. All populations 
were in Hardy-Weinberg equilibrium with the exception of one, the Finnish 
sample (n= 33, P= 0.05). Genotyping was performed using the ABI TaqMan 
allelic discrimination kit and the ABI7900HT Sequence Detection System 
(Applied Biosystems). SAS 9.1 (SAS Institute) was used for statistical analyses. 


Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 
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METHODS 

Study subjects. The AIDS Link to Intravenous experience (ALIVE) is an ongoing 
study of injection drug users*'. The Multicentre Hemophilia Cohort Study 
(MHCS) is a prospectively followed cohort of patients with coagulation disorders 
from 16 treatment centres’. The Hemophilia Growth and Development Study 
(HGDS) is a continuing study of children and adolescents with hemophilia”. The 
REVELL study draws both resolved and chronic HCV infections from a large 
blood bank network consisting of 17 blood centres in the western and southern 
United States. Individuals with viral clearance in the ALIVE, MHCS and HGDS 
cohorts were matched to two individuals with viral persistence within the same 
cohort based on HIV status, gender and ethnicity (African-American, European- 
American, other). All of the individuals in the HCV clinic cohort cleared the virus, 
so they were not matched. Participants in the REVELL cohort were all HIV 
negative and HBV negative. Participants in the UK cohort were individuals 
referred to hepatology clinics. They were all Caucasian, HBV negative and HIV 
negative. There was no significant difference in the results when matched and 
unmatched cohorts were analysed separately (Supplementary Table 3). 

HCV testing. HCV infection was established by a second- or third-generation 
enzyme immunoassay (EIA) (Ortho Diagnostics Systems). Individuals with sub- 
sequent negative HCV RNA tests were confirmed anti-HCV positive by a third- 
generation EIA test or a recombinant immunoblot assay (RIBA) that was separated 
from the first by a minimum of 6 months. HCV RNA was assessed by a branched 
DNA (bDNA) assay (Quantiplex HCV RNA 2.0 assay; Chiron Corporation), a 
qualitative HCV COBAS AMPLICOR system (COBAS AMPLICOR HCV; Roche 
Diagnostics), or by transcription-mediated amplification (TMA) (Novartis and 
Gen Probe Inc.). Those subjects with a sample below the limit of detection by the 
bDNA assay (potential subjects with HCV recovery) had a repeat sample separated 
by 6 months from the first one tested with the qualitative COBAS. HCV infection 


nature 


in blood donors (REVELL study) was established by third-generation EIA and 
confirmed using RIBA. HCV RNA status was established using nucleic acid amp- 
lification testing (NAT) of minipools representing 16 donation samples (Procleix 
HIV-1/HCV Assay, Gen-Probe, Novartis). A reactive minipool result triggered 
NAT of the individual donations comprising the pool in order to identify the 
NAT-reactive donation. Residual volume after NAT screening from all antibody 
positive/RNA negative donors was retested by duplicate undiluted HCV RNA 
testing using TMA. Individuals with HCV clearance had undetectable HCV 
RNA in serum or plasma at two time points separated by a minimum of 6 months. 
Persistently infected individuals had detectable HCV RNA in serum or plasma at 
two time points separated by a minimum of 6 months. 

Statistical analysis. SAS 9.1 (SAS Institute) PROC FREQ was used to compute 
frequencies and Fisher’s exact test P-values on categorical variables. PROC 
LOGISTIC was used to obtain odds ratios and 95% confidence intervals. 
PROC MEANS was used to calculate mean age. Analyses were performed with 
all ethnic groups combined and individuals of European and African ancestry 
separately. All analyses were adjusted for study groups and for ethnicity in three 
categories (European ancestry, African ancestry and others) when all ethnic 
groups were combined. Statistical significance refers to two-sided P-values of 
<0.05. 

Fsy analysis. Population samples were checked in ALFRED to identify all SNPs 
for which individuals had been typed. For 32 samples an additional 1,062 SNPs 
were available. Data were pooled by regional affiliation (see Table 3) into six 
groupings allowing improved frequency estimation: Africa, Europe, south Asia, 
southeast Asia, east Asia and Oceania. The extent of regional differentiation, Fey, 
was determined for each of the individual SNPs and its distribution plotted. The 
Fs; of similarly pooled rs12979860 allele frequencies was compared to that found 
at other loci. 
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A genome-wide linkage and association scan reveals 


novel loci for autism 


Lauren A. Weiss’**+, Dan E. Arking** & The Gene Discovery Project of Johns Hopkins & the Autism Consortiumt 


Although autism is a highly heritable neurodevelopmental dis- 
order, attempts to identify specific susceptibility genes have thus 
far met with limited success’. Genome-wide association studies 
using half a million or more markers, particularly those with very 
large sample sizes achieved through meta-analysis, have shown 
great success in mapping genes for other complex genetic traits. 
Consequently, we initiated a linkage and association mapping study 
using half a million genome-wide single nucleotide polymorphisms 
(SNPs) in a common set of 1,031 multiplex autism families (1,553 
affected offspring). We identified regions of suggestive and signifi- 
cant linkage on chromosomes 6q27 and 20p13, respectively. Initial 
analysis did not yield genome-wide significant associations; 
however, genotyping of top hits in additional families revealed an 
SNP on chromosome 5p15 (between SEMASA and TAS2R1) that 
was significantly associated with autism (P= 2 X 10~’). We also 
demonstrated that expression of SEMAS5A is reduced in brains from 
autistic patients, further implicating SEMAS5A as an autism suscep- 
tibility gene. The linkage regions reported here provide targets for 
rare variation screening whereas the discovery of a single novel 
association demonstrates the action of common variants. 

For a high-resolution genetic study of autism, we selected families 
with multiple affected individuals (multiplex) from the widely 
studied Autism Genetic Resource Exchange (AGRE) and US 
National Institute for Mental Health (NIMH) repositories (Sup- 
plementary Methods and Supplementary Table 1). Although the 
phenotypic heterogeneity in autism spectrum disorders (ASDs) is 
extensive, in our primary screen we selected families in which at least 
one proband met Autism Diagnostic Interview-Revised (ADI-R) cri- 
teria for diagnosis of autism and included additional siblings in the 
same nuclear family affected with any autism spectrum disorder. We 
previously reported an early copy number analysis that revealed a 
significant role for microdeletion and duplication of 16p11.2 in ASD 
causation’; here, we present extensive genome-wide linkage and asso- 
ciation analyses performed with this high density of SNPs and 
identify independent and novel genome-wide significant results by 
both linkage and association analyses. 

Wecombined families and samples from two sources for the primary 
genetic association screen. The AGRE sample included nearly 3,000 
individuals from over 780 multiplex autism families in the AGRE 
collection’ genotyped at the Broad Institute on the Affymetrix 5.0 
platform, which includes over 500,000 SNPs. The NIMH sample 
included a total of 1,233 individuals from 341 multiplex nuclear 
families (258 of which were independent of the AGRE sample) geno- 
typed at the Johns Hopkins Center for Complex Disease Genomics on 
Affymetrix 5.0 and 500K platforms, including the same SNP markers as 
were genotyped in the AGRE sample. 


Before merging, we carefully filtered each data set separately to ensure 
the highest possible genotype quality for analysis, because technical 
genotyping artefacts can create false positive findings. We therefore 
examined the distribution of y values for the highest quality data, 
and used a series of quality control (QC) filters designed to identify a 
robust set of SNPs, including data completeness for each SNP, 
Mendelian errors per SNP and per family, and a careful evaluation of 
inflation of association statistics as a function of allele frequency and 
missing data (see Methods). As 324 individuals were genotyped at both 
centres, we performed a concordance check to validate our approach. 
After excluding one sample mix-up, we obtained an overall genotype 
concordance between the two centres of 99.7% for samples typed on 
500K at Johns Hopkins University and 5.0 at the Broad Institute and 
99.9% for samples run on 5.0 arrays at both sites. The combined data set, 
consisting of 1,031 nuclear families (856 with two parents) and a total of 
1,553 affected offspring, was used for genetic analyses (Supplementary 
Table 1). These data were publicly released in October 2007 and are 
directly available from AGRE and NIMH. 

For linkage analyses, the common AGRE/NIMH data set was further 
merged with Illumina 550K genotype data generated at the Children’s 
Hospital of Philadelphia (CHOP) and available from AGRE, adding 
~300 nuclear families (1,499 samples). We used the extensive overlap 
of samples between the AGRE/NIMH and the CHOP data sets (2,282 
samples) to select an extremely high quality set of SNPs for linkage 
analysis. Specifically, we only included SNPs genotyped in both data 
sets with >99.5% concordance and =1 Mendelian error. 

Linkage analysis involving high densities of markers, where clusters 
of markers are in linkage disequilibrium (LD), can falsely inflate the 
evidence for genetic sharing among siblings when neither parent is 
genotyped’. To alleviate these concerns, we analysed a pruned set of 
16,311 highly polymorphic, high-quality autosomal SNPs which were 
filtered to remove any instances in which two nearby markers were 
correlated with r° > 0.1, providing a marker density of ~0.25 cM (see 
Methods). In this analysis of 878 families, four genomic regions 
showed LOD scores in excess of 2.0 and one region, 20p13, exceeded 
the formal genome-wide significance threshold of 3.6 (ref. 5) (maxi- 
mum LOD, 3.81; Fig. la and Supplementary Table 2). Restricting 
analysis to only those families with both parents genotyped (784 
families) showed that these results are not an artefact of missing 
parental data (Fig. 1b). We further tested the stability of these results 
by varying the recombination map and halving the marker density by 
placing every other marker into two non-overlapping SNP sets 
(Methods Summary); all analyses showed consistent and strong link- 
age to the same regions (data not shown). 

We used the transmission disequilibrium test (TDT) across all SNPs 
passing quality control in the complete family data set for association 
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Figure 1| Genome-wide linkage results. a, The genome-wide linkage results 
are shown, with the orange line indicating non-parametric linkage (NPL) 
LOD = 3 and the yellow line indicating NPL LOD = 2. b, Four 
chromosomes with LOD > 2. The black and blue lines indicate results from 


analyses as the TDT is not biased by population stratification. We 
estimated a threshold for genome-wide significance using both per- 
mutation (P< 2.5 X 10”) and estimating the effective number of tests 
(P<3.4X 10 ”), and use the more conservative here (see Methods). 
No SNP met criteria for genome-wide significance at P< 2.5 X 107”. 
However, we observed an excess of independent regions associated at 
P<10 ° (6 observed versus 1 expected) and P< 10 * (30 observed 
versus 15 expected) despite the lack of overall statistical inflation 
(A = 1.03, Supplementary Fig. 1), suggesting that common variants 
in autism exist, but that our initial scan did not have sufficient statis- 
tical power to identify them definitively (Table 1 and Supplementary 
Fig. 2). 

For the TDT associations with P< 10~*, we additionally used the 
cases that were excluded from the TDT due to missing parental data. We 
matched 90 independent and unrelated cases with 1,476 NIMH control 
samples genotyped on the Affymetrix 500K arrays®, and performed 
case-control association analysis (Supplementary Table 3), combining 


Table 1| Top TDT results and replication data 
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families with both parents genotyped and all families, respectively. The green 
line indicates information content (right-hand y axis). The red circle 
indicates the position of the centromere. 


these results with the TDT data. Promisingly, we now observed eight 
SNPs (in seven independent regions) with association at P< 10° 
(Table 1). Of note, comparing Caucasian with non-Caucasian samples 
in the AGRE/NIMH data set, we did not observe significant heteroge- 
neity for top results. 

Our strongest associations were at chromosome 4q13 (1s17088254, 
P=8.5*xX 10 °) between CENPCI1, a centromere autoantigen, and 
EPHAS, an ephrin receptor potentially involved in neurodevelop- 
ment; at 5p15 (rs10513025, P= 1.7 x 10 °) in the EST DB512398, 
located between SEMA5A and TAS2R1; at 6p23 (1s7766973, 
P=6.8 X 10°’) in JARID2, an orthologue of the mouse jumonji gene, 
encoding a nuclear protein essential for embryogenesis, especially 
neural tube formation; at 9p24 (1s4742409, P= 7.9 x 10°) between 
PTPRD, a protein tyrosine phosphatase involved in neurite out- 
growth, and JMJD2C (also called KDM4C), a jumonji-domain con- 
taining protein involved in tri-methyl-specific demethylation; at 9q21 
(rs952834, P=7.8 X10 °) between ZCCHC6, a zinc finger and 


Locus Scan Replication Meta-analysis 
Chromosome Position SNP LD, proxy T U OR P P (with T U OR P (1-sided) P (meta) P (proxy) 
case-control) 

4 68019960 1s17088254 - 137 219 063 14x10° 8&5x10° 48 38 3 NA 0.011 48x10°3 

4 68189460 1s2632453 r =0.67, 171 245 0.70 29x10* 2.4x10~% 248 234 06 NA 0.022 - 
rs17088254 

5 9676622 s10513025  - 84 152 055 96x10 ° L7x10°-& 152 199 0.76 61x10? 2.1107’ 

6 15365718 rs13208655 1 =0.74, NA NA NA NA NA 829 831 1.00 048 0.48 7 
rs7766973 

6 15376030 rs7766973 = 631 811 0.78 21x10 ° 68x10” 139 142 098 043 2.0x10* 28x10 % 

9 7763723 ~— rs4742408 = 591 739 080 49x10° 2.7x10% 241 224 .08 A 0.030 

9 7764180 rs4742409—- 499 645 0.77 16x10 ° 7.9x10°° 77 87 089 0.22 16x10 * 36x10“ 

9 7764774 — +s6477233 P=06, NA NA NA NA NA 734 752 0.98 0.32 0.32 7 
rs4742409 

9 86471331 1s952834 = 656 825 0.80 1x10°° 7.8x 107° 173 160 .08 A 54x103 - 

10 68842909 1s7923367 = 89 160 0.56 68X10 ° 3.4x10°° 18 25 0.72 0.14 41x10°° = 

11 22775950 1s12293188 - 449 327 1.37 2X10° 11x10 486 513 0.95 A 3.0x107 - 

11 22785182 1s16910190 - 421 308 137 28x10 ° 14x10° 55 67 0.82 A 0.014 7 

11 22785488 1s16910194 - 444 330 135 42x10° 3.7x10°° 80 75 07 0.34 28x10 4% - 

11 22791645 1s3763947 = 429 320 134 68x10 ° 34x10° 57 57 00 A 24x103 - 

Top results from the combined TDT and case-control analysis are shown (P< 10°), with replication data, where it exists. For Sequenom genotyping that used a proxy SNP, that SNP and its LD (7) 


with the SNP of interest is shown. Transmitted (T) and untransmitted (U) counts and odds ratios (O 
family data using Affymetrix and Sequenom genotyping technology. The meta-analysis P-value is s| 
TDT/case-control analysis, P< 0.05 replication, P< 2.5 X 10” meta-analysis. NA, not applicab 
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R) for the minor allele are shown for each SNP. Replication results are shown for additional autism 
hown as is the P-value for meta-analysis where proxy SNP data was included. Bold font: P<10-° 
e. 
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CCHC domain containing protein, and GAS1, growth-arrest-specific 
protein; at 10q21 (rs7923367, P= 3.4 X 10 °) in CTNNA3, 03 cate- 
nin, which may be involved in the formation of stretch-resistant cell— 
cell adhesion complexes; and two SNPs on 11p14 (1812293188, 
P=1.1X10 °; rs16910194, P= 3.7 X 10 °) in GAS2, a caspase-3 
substrate that has a role in regulating microfilament and cell shape 
changes during apoptosis and can modulate cell susceptibility to 
p53-dependent apoptosis by inhibiting calpain activity (Table 1). 

To confirm whether any of these top results might indicate true 
susceptibility loci, we attempted to replicate these signals, as well as 
others with P< 107‘ in the initial TDT that met stringent genotyping 
quality criteria (Supplementary Table 3). We used several data 
sources to replicate the association results. First, we used additional 
autism family samples (318 trios collected by investigators of the 
Autism Consortium and in Montreal) with genome-wide 
Affymetrix 5.0/500K array data also genotyped at the Genetic 
Analysis Platform of the Broad Institute using the same conditions, 
QC and analysis pipelines (Methods). 

Second, independent Autism Genome Project (AGP) families, 
along with a set of Finnish families and a set of Iranian trios, were 
used for replication of our top findings (n= 1,755 trios). Two 
Sequenom replication pools were designed, attempting to include as 
many of the regions associated at P< 10” * as possible. The full set of 
SNPs considered and those successfully genotyped are shown in 
Supplementary Table 3, with linkage disequilibrium (17) noted for 
SNPs selected as proxies for Affymetrix markers. One of the eight 
SNPs with P< 10° (1s10513025) that failed in this Sequenom assay 
was subsequently replaced in a subset of AGP samples with a TaqMan 
assay. This assay showed 99.89% concordance with Affymetrix geno- 
types in the overlapping AGRE-NIMH samples (2,797 out of 2,800 
concordant genotypes), with manual review of the Affymetrix geno- 
type calls also confirming the marker to be of extremely high quality 
(Supplementary Fig. 4). In the independent replication effort, only 
rs10513025 was associated with P< 0.01 (Table 1). 

Combining the scan and replication data, only rs10513025 met cri- 
teria for genome-wide significance defined by LD and permutation 
analyses (P< 2.5 X 10~”). To increase coverage of this region and fill 
in missing genotypes and SNPs that failed quality control, we performed 
imputation analysis. rs10513026 was highly (but not perfectly) corre- 
lated to the replicated chromosome 5 SNP (rs10513025) and showed 
even stronger association than originally observed with rs10513025 
(Supplementary Fig. 3). These and several other promising SNPs were 
directly genotyped in the original scan samples and, in fact, showed 
higher levels of significance (Table 2). Direct genotyping confirmed that 
rs10513026 showed stronger association than rs10513025 (P-value 
4.5 X 10° ° versus 9.8 X 10° ° in the re-genotyped scan trios), increasing 
the significance of this observation further. Several other promising 
results from this analysis were genotyped in a subset of scan samples, 
and, of note, the top SNP in imputation analysis (rs10874241, imputa- 
tion P= 9.8 X 10’, odds ratio (OR) = 0.43) showed consistent results 
(OR=0.4, P=4X 107”) when directly genotyped (Supplementary 
Table 4). 

rs10513025 and neighbours are on chromosome 5p15 in a region 
of LD containing several other ESTs and TAS2R1, a bitter taste recep- 
tor (Supplementary Fig. 3). The SNPs are ~80 kb upstream of sema- 
phorin 5A (SEMAS5A), a gene implicated in axonal guidance and 
known to be downregulated in lymphoblastoid cell lines of autism 


Table 2 | Chromosome 5p15 SNPs 
SNP Chromosome Position MAF OR P 
rs10513025 5 9676622 0.041 0.5526 9.58x10°° 0.006059 


rs10513026 5 9677106 0.040 0.53 450x10°° NA 
rs16883317 5 9701592 0.038 0.53 7.20X107° NA 


Replication P 


Three SNPs in the chromosome 5p15 association locus genotyped by Sequenom iPLEX are 
shown, with minor allele frequency (MAF), odds ratio (OR) and P-value in the AGRE and NIMH 
sample, as well as replication data from all available samples for rs10513025 (see Methods). 
NA, not applicable. 
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cases versus healthy controls’. An independent study at Children’s 
Hospital Boston using whole blood (S.W.K., L.K. and Z.K., manu- 
script in preparation) confirms this lower expression (P = 0.0034) of 
SEMASA in autism cases versus controls. To evaluate the role of this 
locus in autism pathogenesis more completely, we evaluated the 
entirety of 5p15 for copy-number variation. Despite excellent probe 
coverage throughout the locus, no common or rare copy number 
variants were detected in the entire AGRE scan in the region of LD 
surrounding the associated SNPs and the entire SEMA5A locus 
including 250 kb up- and downstream (see Methods). 

To test directly SEMAS5A expression in brains from autistic 
patients, tissue samples from 20 cases with a primary diagnosis of 
autism and 10 controls were obtained through the Autism Tissue 
Program and the Harvard Brain Bank. Samples were dissected from 
Brodmann area 19 of the occipital lobe cortex, a region demonstrat- 
ing differences between autism cases and controls in functional 
imaging studies, and subjected to quantitative PCR*. SEMA5SA 
expression, determined relative to MAP2 (neuron specific), was sig- 
nificantly lower in autism brains than controls after adjustment for 
the age at brain acquisition, post-mortem interval and sex 
(P = 0.024, Fig. 2). 

We also analysed our data for association signals at candidate 
genes or regions with previous evidence of involvement in autism. 
Although there are few well-replicated associations of biological can- 
didate genes, there are many rare genetic variants, diseases and syn- 
dromes associated with autism. Most of these loci have not been 
systematically assessed to see whether common variation in the gene 
or region might contribute to autism. We assessed four categories of 
candidate loci: (1) genes with previous evidence for association with 
common variation; (2) genes implicated by rare variants leading to 
autism; (3) genes causing Mendelian diseases associated with autism; 
and (4) regions where microdeletion or microduplication syndromes 
are associated with autism. For each gene, we included all SNPs 
passing basic quality criteria within 2 kb of the transcript. 

Overall, there were no compelling results in these sets (all P> 10“), 
considering the number of SNPs tested, and only two regions met 
criteria for region-wide (only SNPs in that gene/region considered) 
or set-wide (for example, all candidate regions in the set of common 
variant genes considered) significance by permutation testing 
(Supplementary Table 5). MECP2 (Rett syndrome) met criteria for 
region-wide association (P = 0.0071, 5 SNPs, Supplementary Table 
5). Moreover, the Williams syndrome region was borderline for set- 
wide significance (P= 0.051, Supplementary Table 5). One SNP in 
particular showed strong association (1rs2267831, P= 0.00012, 
OR = 0.56)—as this was a rare SNP with undertransmission of the 
minor allele, we genotyped a subset of families and observed similar, 
slightly less significant distortion (OR = 0.61). The SNP is located 
within GTF2IRDI1, a transcription factor within the critical region 
for the Williams syndrome cognitive behavioural profile’”’. 
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Figure 2 | SEMASA expression in autism brains. SEMA5A gene expression 
is shown relative to MAP2. Diamonds indicate individual expression levels 
for each sample; error bars indicate standard error (s.e.). 
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There seems to be little overlap between the regions of strongest 
linkage and association in this study. A more detailed assessment of 
SNP and haplotype association in the most significant linkage regions 
did not yield common variation that could explain the evidence for 
linkage (Supplementary Table 6). This is an expected outcome if 
linkage signals arise from rare, high penetrance variation (for which 
the genotyping arrays do not offer an adequate proxy) whereas asso- 
ciation is sensitive to common variation with lower penetrance (that 
cannot be detected by linkage). For example, a 0.3% variant that 
increases risk by tenfold would readily be picked up by this informa- 
tive linkage scan, but would very likely not be assessed by the 
common SNPs on the Affymetrix 5.0 array; by contrast, the modest 
and protective impact of the 5% variant at the SEMA5A rs10513025 
creates no detectable excess allele sharing among siblings but is 
strongly detected by association. 

During review of this manuscript, another genome-wide asso- 
ciation study (GWAS) was published which identified significant 
association to SNPs on chromosome 5p14"*. Although there was sig- 
nificant overlap between study samples, each of these scans contained 
a large set of unique families, so we sought to evaluate independent 
evidence of the top SNP (1s4307059) reported at 5p14. This SNP 
happens to be directly genotyped by both Affymetrix and Hlumina 
platforms. We have a sizable number (n= 796) of affected subjects 
with two parents genotyped (and of predominantly similar European 
background). However, we observed no support for association at 
this locus (T:U 354:335 in favour of the minor allele, a trend in the 
opposite direction as reported). 

Autism genes have been difficult to identify, despite the high herit- 
ability ofautism spectrum disorders. Up to 10% of autism cases may be 
due to rare sequence and gene dosage variants, for example, mutations 
in NRXN1, NLGN3/NLGN4, SHANK3 and copy number variants at 
15ql1-q13 and 16p11.2. A number of diseases of known aetiology, 
including Rett syndrome, fragile X syndrome, neurofibromatosis type 
I, tuberous sclerosis, Potocki-Lupski syndrome, and Smith—Lemli-— 
Opitz syndrome are also associated with autism’'*. However, the 
remaining 90% of autism spectrum disorders, although highly 
familial, have unknown genetic aetiology. A genome-wide linkage 
study using the Affymetrix 10K SNP array to genotype over 1,000 
families found no genome-wide significant linkage signals, but docu- 
mented suggestive linkage at 11p12-p13 and 15q23-q25 and rein- 
forced a modest role for rare copy-number variants". 

Many complex diseases have recently had great success with 
GWAS approaches, but most identified modest effects with odds 
ratios less than 1.3 (http://www.genome.gov/26525384). Our asso- 
ciation analysis has excellent statistical power (>80%) to find effects 
of relatively common alleles (0.01—0.25 in frequency) explaining 1% 
of the variance in autism at the genome-wide significant level. It is 
near-perfectly powered for alleles of SNPs present on the array (or 
perfectly proxied) down to 1% at the replication cutoff P<10~*, 
assuming additive background genetic variance of 0.8 and shared 
environmental variance of 0.05 with prevalence of 0.006. One of 
the advantages of a family-based association test is that we avoid false 
positive results generated by population stratification, and in addi- 
tion, we have performed careful quality control to reduce the chances 
of being misled by technical artefacts. However, the SNP coverage of 
the Affymetrix 5.0 chips is incomplete; in fact, a recent re-sequencing 
survey suggests that these arrays assay only 57% of variants with 
minor allele frequency (MAF) >5% at r = 0.8 (ref. 15). We therefore 
cannot exclude untested variation of large effect in autism. The link- 
age analysis, assuming a fully informative marker in 800 sibling pairs, 
should detect sibling allele sharing of at least 55.125%"°. 

Our linkage analysis revealed two novel regions of linkage, 6q27 
(LOD = 2.94) and 20p13 (LOD = 3.81), with the latter formally 
exceeding the threshold for genome-wide significance. There is some 
overlap between the more modest signals (LOD >2 on chromosome 
15 and chromosome 17) and previously reported suggestive linkage 
signals, but little overlap with the most promising regions of common 
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SNP association. This suggests that the regions of the genome showing 
linkage may harbour rare variation, potentially with allelic heteroge- 
neity across families, which would require re-sequencing to uncover, 
as has been demonstrated for the 7q35 region’’””. Interestingly, sev- 
eral of these regions overlap with rare syndromes or genetic events 
known to be strong risk factors for autism. For example, an autism 
case with a translocation disrupting 15q25 has been reported, whereas 
the 17p region overlaps the Smith-Magenis and Potocki—Lupski syn- 
drome region. 

The initial TDT analysis of this large multiplex autism data set did 
not reveal any associations meeting criteria for genome-wide signifi- 
cance, suggesting that there are not many common loci of moderate to 
large effect size even in a highly heritable disorder like autism. 
Nevertheless, replication data in our study identified a novel locus with 
genome-wide significant evidence for association to autism. In addi- 
tion, several other SNPs in the region show similarly strong association 
(rs10513026, rs16883317). We ascertained a large replication sample 
from independent family studies with a replication at P = 0.0061 and 
meta-analysis showed this association (P= 2.12 X 10”) to meet cri- 
teria for genome-wide association in our experiment. This region on 
chromosome 5 harbours the gene encoding the bitter taste receptor, 
TAS2R1, and several uncharacterized ESTs and is adjacent to SEMASA, 
a member of the semaphorin axonal guidance protein family, which 
has shown downregulated expression in transformed B lymphocytes 
from autism samples’. We have further extended this finding by 
directly demonstrating lowered SEMASA gene expression in autism 
brain tissue. This is an attractive candidate gene given that its protein is 
a bi-functional guidance molecule, which is both attractive and inhibi- 
tory for developing neurons. Interestingly, the SEMA5A receptor is 
plexin B3, which also signals through the tyrosine kinase MET, a previ- 
ously reported autism susceptibility gene’. 

Finally, we investigated whether different classes of genes or 
regions—loci previously implicated by functional or positional can- 
didate gene association studies, rare variants implicated in autism, 
Mendelian disorder genes with association to autism, or regions of 
copy number variation associated with autism—showed association 
with common alleles included in our marker set. Although there were 
several nominally significant associations, only the Williams syn- 
drome region (one SNP in GTF2IRD1) was borderline statistically 
significant (P= 0.051), after correcting for the microdeletion/ 
duplication syndrome regions tested. In the category of Mendelian 
disorders associated with autism, MECP2, the gene for Rett syn- 
drome, showed region-wide statistical significance. These results 
raise the possibility that Rett and Williams syndrome genes may 
contribute more generally to autism spectrum disorders. Although 
the genes in which common variation has been reported to be asso- 
ciated with autism do not show evidence for association, this cannot 
be interpreted as failure to replicate previous results in all cases, 
because much of the variation reported as associated is not captured 
on the Affymetrix platform (for example, length polymorphisms, 
microsatellites, untagged SNPs such as the promoter variant at 
MET""). Instead, despite a high density of markers, our results suggest 
that we did not identify additional common variation with evidence 
for association. Overall however, our results indicate that these pos- 
tulated candidate regions, mostly based on rare events known to 
cause autism, are not among the regions with common alleles having 
the strongest risk effects for autism. 

Interestingly, both our linkage and association analyses, from the 
primary and replication analyses, suggest that low-frequency (<0.05) 
minor alleles may be common in autism. Intriguingly, the linkage 
studies reveal low-frequency susceptibility alleles whereas the asso- 
ciation analyses have uncovered rare alleles with odds ratios less than 
0.6 (the common alleles in the population associated with increased 
risk for autism). This can occur when the ancestral allele, that was 
previously neutral or beneficial, now has detrimental effects revealed 
by an evolutionarily recent environment, or when a pleiotropic func- 
tion of the allele is selectively advantageous, or when this variation is 
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hitch-hiking on a shared haplotype with a distinct beneficial allele”. 
However, it is worth noting that our study design of ascertaining 
multiplex families is not well powered to identify loci under this 
genetic model of common major alleles associated with autism sus- 
ceptibility. 

We report genome-wide significant linkage as well as an asso- 
ciation of common genetic variation with autism. Our results will 
require follow-up to identify the functional variation in the linkage 
and association regions that we report here and to probe the func- 
tions of the relatively unstudied transcripts implicated. These results 
could provide completely novel insight into the biology and patho- 
genesis of a common neurodevelopmental disorder. 


METHODS SUMMARY 
Samples and genotyping. Our primary samples are from the AGRE and NIMH 
Repositories. Replication with Affymetrix technology included NIMH controls, 
families collected by members of the Autism Consortium, and families ascer- 
tained from Montreal. Replication with Sequenom technology included the 
Autism Genome Project, Finnish, and Iranian subsets of Autism Consortium 
investigator-collected families. Details of the ascertainment for each sample 
collection, genotyping and quality control processes can be found in Methods. 
Linkage and association analysis. The linkage analysis was conducted with a 
pruned autosomal SNP set (see Methods for details of marker selection) and 
chromosome X set (670 SNPs) using the cluster option in MERLIN/MINX (77 < 
0.1)”, yielding 16,581 independent markers. We performed confirmatory ana- 
lysis on non-overlapping data sets by selecting alternative SNPs. 

Association analysis was performed in PLINK™. The basic association test was 
a transmission disequilibrium test (TDT), and the extra cases versus controls 
analysis was performed by allelic association, after excluding cases that were not 
well matched to the controls, based on multi-dimensional scaling (A < 1.1). 
Combining the TDT and case-control tests was performed using expected and 
observed allele counts by the formula Zeta = (S exp — STobs)//S var. Meta- 
analysis of AGRE/NIMH and replication data was performed using the statistic 
(ZaGreniMu t Zreplication)/ /2. Gene-set analysis was performed in PLINK using 
the set-based TDT. Imputation-based association was performed in PLINK with 
the proxy-tdt command, using the HapMap CEU parent samples as the reference 
panel and information score >0.8. Haplotype analysis in the linkage regions was 
performed using 5-SNP sliding windows, as implemented in PLINK hap-tdt. See 
Methods for details of determination of genome-wide significance thresholds. 


Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 
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METHODS 


All samples used in this study arose from investigations approved by the indi- 
vidual and respective Institutional Review Boards in the USA and at inter- 
national sites where relevant. Informed consent was obtained for all adult 
study participants; for children under age 18, both the consent of the parents 
or guardians and the assent of the child were obtained. 

Primary study samples: AGRE samples. The Autism Genetic Resource 
Exchange (AGRE) curates a collection of DNA and phenotypic data from multi- 
plex families with autism spectrum disorder (ASD) available for genetic 
research*. We genotyped individuals from 801 families, selecting those with at 
least one child meeting criteria for autism by the Autism Diagnostic Interview- 
Revised (ADI-R)”’, whereas the second affected child had an AGRE classification 
of autism, broad spectrum (patterns of impairment along the spectrum of per- 
vasive developmental disorders, including pervasive developmental disorder not 
otherwise specified (PDD-NOS) and Asperger’s syndrome) or not quite autism 
(NQA, individuals who are no more than one point away from meeting autism 
criteria on any or all of the social, communication, and/or behaviour domains 
and meet criteria for ‘age of onset’; or, individuals who meet criteria on all 
domains, but do not meet criteria for the ‘age of onset’). We excluded probands 
with widely discrepant classifications of affection status via the ADI-R and 
Autism Diagnostic Observation Schedule (ADOS) that could not be reconciled. 
We also excluded families with known chromosomal abnormalities (where kar- 
yotyping was available), and those with inconsistencies in genetic data (generat- 
ing excess Mendelian segregation errors or showing genotyping failure on a test 
panel of 24 SNPs used to check gender and sample identity with the full array 
data). The self-reported race/ethnicity of these samples is 69% white, 12% 
Hispanic/Latino, 10% unknown, 5% mixed, 2.5% each Asian and African 
American, less than 1% native Hawaiian/Pacific Islander and American 
Indian/native Alaskan. 

Primary study samples: NIMH samples. The NIMH Autism Genetics Initiative 
maintains a collection of DNA from multiplex and simplex families with ASD. We 
genotyped individuals from 341 nuclear families, 258 of which were independent 
of the AGRE data set, with at least one child meeting criteria for autism by the 
ADI-R, anda second child considered affected using the same criteria as described 
for the AGRE data set above. Similar exclusion criteria were used, including 
known chromosomal abnormalities and excess non-Mendelian inheritance. 
The self-reported race/ethnicity of these samples is 83% white, 4% Hispanic, 
2% unknown, 7% mixed, 3% Asian and 1% African American. 

Primary study samples: merged data set for primary screening. We used the 
Birdseed algorithm for genotype calling at both genotyping centres**’’. As 324 
individuals were genotyped at both centres, we performed a concordance check. 
One sample showed substantial differences between the two centres, but no 
excess of Mendelian errors, indicating that a sample mix-up occurred in which 
each centre genotyped a different sibling that was identified as the same sample. 
Excluding this sample, overall genotype concordance between the two centres 
was 99.72%. 

Before merging data, we examined the distribution of chi-squared values and 
used a series of quality control (QC) filters designed to identify a robust set of 
SNPs. We discovered that filtering AGRE genotypes to 98% completeness and less 
than 10 Mendelian errors (MEs) was sufficient to remove SNPs that artificially 
inflated the chi-squared distribution for SNPs with MAF > 0.05. For MAF < 0.05, 
we observed much greater inflation (A = 1.17), due entirely to a strong excess of 
SNPs with under-transmission of the minor allele (OR < 1). Whereas the same 
filters yielded high-quality results for SNPs with over-transmission of the minor 
allele (A = 1.04), we found that much stricter filtering was required for rarer SNPs 
with OR <1 (missing data <0.005). This is not unexpected based on a well- 
documented bias in the TDT: if missing data are preferentially biased against 
heterozygotes or rare homozygotes, significant, artificial over-transmission of 
the common allele is expected**”’. To achieve comparable quality for the 
NIMH data set, we filtered on 96% completeness and fewer than 4 MEs. Our 
final QQ plot for the combined data set is shown in Supplementary Fig. 1 and has a 
A = 1.03, less than that observed in the Wellcome Trust Case Control Consortium 
paper for five of the seven phenotypes studied*’. The combined data set, consisting 
of 1,031 families (856 with two parents) and a total of 1,553 affected offspring, was 
used for association testing. 

For linkage analyses, the combined AGRE/NIMH data set was further merged 
with Illumina 550K genotype data generated at the Children’s Hospital of 
Philadelphia (CHOP) and available from AGRE, adding ~300 nuclear families 
(1,499 samples). We used the extensive overlap of samples between the AGRE/ 
NIMH and the CHOP data sets (2,282 samples) to select an extremely high 
quality set of SNPs for linkage analysis. Specifically, we required SNPs to be 
on both the Affymetrix 500K/5.0 and Illumina 550K platforms, with >99.5% 
concordance across platforms. We further restricted SNPs to MAF > 0.2, <1% 
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missing data, Hardy-Weinberg P> 0.01, and no more than 1 ME. This left 
~36,000 SNPs of outstanding quality. For autosomal SNPs, we further pruned 
using PLINK to remove SNPs with 7° > 0.1, yielding 16,311 SNPs. 

Replication samples: NIMH control samples. Controls obtained from the 
NIMH Genetics Repository were genotyped on the Affymetrix 500K platform 
at the Broad Institute Genetic Analysis Platform for another study’. Of these, 
1,494 matched well with our sample, and were used as controls to compare with 
the cases and parents in our study. 

Replication samples: Montreal samples. Subjects diagnosed with autism spec- 
trum disorders with both of their parents were recruited from clinics specializing 
in the diagnosis of Pervasive Developmental Disorders (PDD), readaptation 
centres, and specialized schools in the Montreal and Quebec City regions, 
Canada, as described*'. Subjects with ASD were diagnosed by child psychiatrists 
and psychologists expert in the evaluation of ASD. Evaluation based on the 
Diagnostic and Statistical Manual of Mental Disorders (DSM) criteria included 
the use of the ADI-R” and the ADOS”. As an additional screening tool for the 
diagnosis of ASD, the Autism Screening Questionnaire, which is derived from 
the ADI-R, was completed*’. Furthermore, all proband medical charts were 
reviewed by a child psychiatrist expert in PDD to confirm their diagnosis and 
exclude subjects with any co-morbid disorders. Exclusion criteria were: (1) an 
estimated mental age <18 months; (2) a diagnosis of Rett syndrome or child- 
hood disintegrative disorder; and (3) evidence of any psychiatric and neuro- 
logical conditions including: birth anoxia, rubella during pregnancy, fragile X 
syndrome, encephalitis, phenylketonuria, tuberous sclerosis, Tourette and West 
syndromes. Subjects with these conditions were excluded based on parental 
interview and chart review. However, participants with a co-occurring diagnosis 
of semantic-pragmatic disorder (owing to its large overlap with PDD), attention 
deficit hyperactivity disorder (seen in a large number of patients with ASD 
during development), and idiopathic epilepsy (related to the core syndrome of 
ASD) were eligible for the study. 

Replication samples: Santangelo EDSP family samples. Families were ascer- 
tained for having one or more autistic children and at least one non-autistic child 
aged 16 or older for an extremely discordant sibling-pair linkage study. 
Recruitment took place in Massachusetts and surrounding states through con- 
tacts with parent support and patient advocacy groups, brochures, newsletters 
and the study website. Parents were interviewed about their children, and non- 
autistic children were interviewed about themselves. An informant/caregiver, 
usually the proband’s mother, was interviewed using the ADI-R to confirm 
the diagnosis of autism at age 4—5 years””*. Families were included if the affected 
children met Diagnostic and Statistical Manual of Mental Disorders-IV (DSM- 
IV) criteria for autistic disorder and their non-autistic siblings (aged 16 and 
older) did not display any of the broader autism phenotype traits, which were 
assessed with the (M-PAS-R), the Pragmatic Language Scale (PLS), and the 
Friendship Interview*””*. Probands were excluded if they had medical conditions 
associated with autism such as fragile X syndrome or gross CNS injury, or if they 
were under 4years of age, owing to the possible uncertainty in diagnosis at 
younger ages. Twenty-nine families met eligibility criteria for the study and 
comprised the final sample for analysis. 

Replication samples: high functioning autism family samples. Families were 
included if their affected child had been previously diagnosed with Autism or 
Asperger’s syndrome, had a level of intellectual functioning above the range of 
mental retardation (that is, full scale, verbal and performance IQ > 70), chro- 
nological age between 6 and 21 years, and an absence of significant medical or 
neurological disorders (including fragile X syndrome and tuberous sclerosis). 
Families were ascertained and recruited through the Acute Residential 
Treatment (ART) programmes and outpatient child and adolescent services at 
McLean Hospital, as well as through associated hospitals and clinics. Brochures 
and a website were also used. Thirty-three families (133 participants) were 
enrolled in the study. Participation was voluntary. 

Replication samples: MGH-Finnish collaborative samples. Altogether 58 indi- 
viduals with a diagnosis of high functioning autism (HFA) or Asperger’s syn- 
drome were recruited in Finland. Fifty-two children and adolescents aged 
8-15 years were identified from patient records at the Oulu University 
Hospital in 2003. These children and adolescents have been evaluated for 
HFA/Asperger’s syndrome at the Oulu University Hospital. In addition, six 
children (3 boys, 3 girls) 11 years of age were recruited from an epidemiological 
study conducted in 2001 (ref. 37). 

All participants had full-scale IQ scores greater than or equal to 80 measured 
with the Wechsler Intelligence Scale for Children—Third Revision”*. Furthermore, 
none of the children subjects was diagnosed with other developmental disorders 
(for example, dysphasia, fragile X syndrome). Clinical diagnoses of HFA/ 
Asperger’s syndrome were confirmed by administering the ADI-R* and the 
ADOS*. Of the 58 participants with HFA/Asperger’s syndrome, 35 met the dia- 
gnostic criteria for Asperger’s syndrome and 21 met the diagnostic criteria for HFA 
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according to ICD-10 (International Classification of Diseases v. 10) diagnostic 
criteria”. Two participants met diagnostic criteria for PDD-NOS; these partici- 
pants were excluded owing to their manifesting different and less severe symptoms 
than our sample of children with HFA or Asperger’s syndrome. 

Replication samples: Children’s Hospital Boston samples. Probands with a 
documented history of clinical diagnosis of ASD were recruited at Children’s 
Hospital Boston. To participate, they had to be over 24 months of age and have at 
least one biological parent or an affected sibling available. Subjects were excluded 
if they had an underlying metabolic disorder or any chronic systemic disease, an 
acquired developmental disability (for example, birth asphyxia, trauma-related 
injury, meningitis, etc.), or cerebral palsy. All participants provided informed 
consent and a phenotyping battery was performed including the ADOS, the ADI- 
Rand other measures to assess cognitive status. Seventy-five per cent of subjects 
with a clinical diagnosis met strict research criteria for ASD on both ADI-R and 
ADOS. In addition, a complete family and medical history was obtained. 
Replication samples: homozygosity mapping collaborative for autism 
(HMCA) samples. Families with cousin marriages and children affected by 
ASD with or without mental retardation were recruited by multiple collabora- 
tors in the HMCA. The patients from Istanbul were evaluated by a child psychi- 
atrist (N. M. Mukaddes) trained in the ADOS and ADI-R, and who made 
diagnoses according to DSM-IV-TR criteria and the Childhood Autism Rating 
Scale (CARS). Patients from Kuwait were enrolled from the Kuwait Centre for 
Autism by S. Al-Saad. In Jeddah, Saudi Arabia, patients were evaluated by both a 
developmental paediatrician (S. Balkhy) and a paediatric neurologist (G. 
Gascon) and diagnoses were based on DSM-IV-TR criteria. In Lahore, 
Pakistan, a neurologist (A. Hashmi) with training in the ADOS and ADI-R 
diagnosed patients using DSM-IV-TR criteria. In most settings, patients were 
enrolled from tertiary clinical centres and these patients had standard of care 
neuromedical assessments, including physical examination, medical and neuro- 
logical history, fragile X testing, and other genetic and metabolic testing when 
indicated. MRI was obtained for patients in whom a brain malformation was 
suspected or seizures were present. In addition, IQ scores (usually from the 
Stanford-Binet) and adaptive behaviour measures were obtained from the 
patients’ existing medical records. Secondary assessments were conducted on 
the most informative pedigrees by the Boston clinical team in collaboration with 
local multi-disciplinary teams. Clinical members of the Boston team included: 
developmental psychologists (J. Ware, E. LeClaire, R. M. Joseph), paediatric 
neurologists (G. H. Mochida, A. Poduri), a clinical geneticist (W.-H. Tan) and 
a neuropsychiatrist (E. M. Morrow). The secondary assessment battery was 
designed to obtain a comprehensive description of current and historical autism 
symptomatology, cognitive and adaptive functioning, and neurological and 
physical morphological status in the patient and pedigree. The secondary assess- 
ment included: neurological examination; genetic dysmorphology examination; 
the CARS; the Social Communication Questionnaire administered with probing 
on par with the ADI-R by ADI-R reliable examiners; the ADOS (usually module 
1); the Vineland Adaptive Behaviour Scales, second edition (VABS-II); Kaufman 
Brief Intelligence Test, second edition (KBIT-II). ADOS assessments were video- 
taped and dysmorphology findings were photographed for archival purposes. 
Replication samples: AGP samples. Individuals typically received at least two of 
three evaluations for autism symptoms: ADI-R, ADOS and clinical evaluation. 
Of the 1,679 affected individuals from 1,443 families, 966 met criteria for autism 
on the ADI-R and ADOS and most of these also had a clinical evaluation of 
autism; 160 affected individuals met criteria for autism on one of the two dia- 
gnostic instruments (ADI-R, ADOS) but were missing information on the other 
instrument; and, 553 individuals met criteria for spectrum disorder on one or 
both instruments. Affected individuals were recruited from both simplex and 
multiplex families, 71% of this sample being from multiplex families. Most of the 
families were of European ancestry (83%). 

Replication samples: Finnish autism family samples. Families were recruited 
through university and central hospitals. Detailed clinical and medical examina- 
tions were performed by experienced child neurologists as described elsewhere”. 
Diagnoses were based on ICD-10? and DSM-IV"! diagnostic nomenclatures. 
Families with known associated medical conditions or chromosomal abnormalities 
were excluded from the study. A total of 106 families included 400 individuals for 
whom genotype data was available. Of these, 111 had a diagnosis of infantile autism 
and 13 a diagnosis of Asperger’s syndrome. All families were Finnish, except for one 
family where the father was Turkish. 

Replication samples: Iranian trio samples. Eligible participants in this study 
were Iranian families with at least one child affected with ASD, including cases of 
autistic disorder, Asperger’s syndrome and PDD-NOS. Eighty families (282 
individuals) from Iran were ascertained and assessed. This sample was ascer- 
tained by screening and diagnostic testing of over 90,000 preschool children 
from Tehran in 2004. Diagnoses of children were made according to DSM-IV 
criteria via the ADI-R and the ADOS. Patients with abnormal karyotypes and 
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dysmorphic features were excluded. Most of the families were father-mother— 
child trios but some had more than one affected child. All affected biological 
siblings were assessed with the same diagnostic tools. We have ascertained and 
assessed 80 families (282 individuals) from Iran. 

Affymetrix genotyping. The AGRE samples were genotyped on Affymetrix 5.0 
chips at the Genetic Analysis Platform of the Broad Institute, using standard 
protocols. The 5.0 chip was designed to genotype nearly 500,000 SNPs across the 
genome to enable genome-wide association studies**’’. The NIMH controls 
were genotyped at the Broad Institute using the Affymetrix 500K Sty and Nsp 
chips, using a similar protocol®. The Autism Consortium and Montreal replica- 
tion samples were also genotyped at the Broad Institute under the same condi- 
tions. The NIMH autism samples were genotyped at the Johns Hopkins Center 
for Complex Disease on the Affymetrix 500K (Nsp and Sty) and 5.0 platforms 
using similar standard protocols. 

Genotype calling for the 5.0 arrays was performed by Birdseed’*”’ and for the 
500K arrays was performed by BRLMM. As basic QC filters for the data generated 
at the Broad Institute, we required that genotyping was >95% complete for each 
individual, and that each family had fewer than 10,000 Mendelian inheritance 
errors across the genome. We also required that each SNP had >95% genotyping, 
fewer than 15 Mendelian errors, Hardy-Weinberg equilibrium P>10~'°, and 
minor allele frequency greater than 1%. For the AGRE sample, this left 2,883 high- 
quality individuals genotyped for 399,147 SNPs with 99.6% average call rate. The 
basic filters for the data generated at Johns Hopkins were individual call rates 
>95% for 5.0 arrays and >90% for 500K arrays data, fewer than 5,000 Mendelian 
errors per family. Only monomorphic SNPs and those with greater than 50% 
missing data were dropped, for 498,216 SNPs. Our combined data set had nearly 
365,000 SNPs passing QC. 

Sequenom genotyping. SNPs were assayed using Sequenom technology for the 
AGP samples at three centres, namely Gulbenkian, Mt Sinai and Oxford: DNA 
from 1,629 families representing numerous recruiting sites was genotyped for 54 
SNPs. SNPs with >3% missing data, namely rs4690464, rs10513025, and 
1317088296, were excluded from analysis. The next step in our QC process was 
to remove families with =4 Mendelian errors, out of 51 remaining loci, under the 
assumption that this indicated pedigree errors. Data from 110 families were 
removed owing to Mendelian errors. Thereafter, SNPs were removed if they 
showed excessive Mendelian errors (>16) in the remaining families. Using this 
criterion, two more SNPs, rs155437 and rs1925058, were removed from analysis. 
It was apparent that DNA quality varied by study site and could be responsible 
for concomitant genotype quality differences. Therefore, we also evaluated rate 
of missing genotypes per locus and study site. Our analyses showed that DNA 
from a few population samples showed excess missingness for two SNPs, 
rs4742408 and r1s7869239, relative to the remaining population samples. 
Specifically three population samples showed more than 7% missing genotypes 
for rs4742408 and rs7869239 whereas the remaining population samples had 
about 1% or less missing genotypes. Therefore, for these loci we deleted geno- 
types only from the samples showing excess missingness. As a final QC step, we 
then evaluated missing genotypes for the remaining loci. If more than five loci 
were missing genotypes, the individual’s data was removed from analysis. By this 
criterion 76 additional families became uninformative for family-based asso- 
ciation analysis, leaving 1,443 families for association analysis. The Finnish 
autism samples were genotyped in the Peltonen laboratory, and the Iranian trios 
were genotyped at the Broad Institute using very similar protocols. All samples 
were genotyped using aliquots from the same pooled primers and probes. 
Copy number analysis. Because of previous reports of two large (>1 Mb), 
independent de novo deletions spanning this locus’, we assessed the region 
surrounding rs10513025 and the entire SEMASA locus for copy number vari- 
ation that could either explain or provide independent evidence of the import- 
ance of this region to autism using Birdsuite*® to analyse all Affymetrix 5.0 
samples. Birdsuite genotypes previously annotated common copy number poly- 
morphisms” and in parallel searches for novel copy number variants (CNVs) 
using an HMM. Probe coverage in the region was good, with no 50-kb window 
having fewer than 10 probes and an average spacing between probes of 2.5 kb, 
allowing very good sensitivity for CNVs greater than 25 kb. We found no dele- 
tions or duplications near this SNP, nor any overlapping the gene SEMASA. The 
closest CNS upstream and downstream of this SNP appeared to be a rare (~2- 
3% frequency, previously annotated CNP) 40-kb deletion from 288 kb from the 
3’ end of SEMASA, and a rare (~1% frequency, novel) 20-kb deletion 356 kb 
upstream of the 5’ end of SEMASA. Each of these appeared to be segregating 
polymorphisms, but fall far outside of the boundaries of SEMASA and TAS2R1 
and far beyond the linkage disequilibrium block containing rs10513025. 
Expression analysis. Fresh-frozen brain tissue samples dissected from the cortex 
(Brodmann area 19) were obtained through the Autism Tissue Program (http:// 
www.atpportal.org) from the Harvard Brain Bank and the NICHD Brain and 
Tissue Bank at the University of Maryland from 20 samples with a primary 
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diagnosis of autism, and 10 controls. Total RNA was extracted using TRIzol 
reagent (Invitrogen) according to the manufacturer’s protocol. Complemen- 
tary DNA (cDNA) was generated from 8 1g of total RNA using the 
Superscript III First-Strand Synthesis kit (Invitrogen). cDNA was diluted 1:5 
in 10mM Tris and 1 ul of diluted cDNA was used per 10 tl PCR reaction. 
Quantitative real-time PCR was performed on a Lightcycler 480 (Roche 
Applied Science) using 2X Taqman Gene Expression Master Mix and probes 
obtained from Applied Biosystems (ABI): SEMA5A (Hs01549381_m1), MAP2 
(Hs01103234_g1), TBP (Hs00920497_m1), GAPDH (4333764F). For multiplex 
reactions, 0.5 ul FAM-labelled SEMASA probe and 0.5 ul VIC-labelled MAP2 
probe were used per 10 pil reaction. The amount of SEMASA relative to MAP2 
was determined for each case using the AAC, method”. Comparison of SEMA5A 
to TBPand GAPDH yielded similar results. Logistic regression was performed on 
autism status, adjusting for age at death, post-mortem interval, sex and SEMA5A 
expression, with a 1-sided P-value reported for the association of lower SEMA5A 
expression with autism status. 

Determination of significance. To determine an appropriate experimental thresh- 
old for genome-wide significance, permutation was performed on this data set by 
gene-dropping, and genome-wide significance was estimated by taking the lowest 
P-value from each of 1,000 permuted data sets and using the 50th as a threshold for 
P<0.05 experiment-wide significance (P< 2.5 X 10”). To calculate an estimate 
of the effective number of tests ( Ti), we used the following algorithm: (1) start with 
the most 5’ SNP on a chromosome (SNP;,), where i= chromosome and j = SNP 
position, and calculate pairwise LD with all downstream SNPs within 1 Mb 
(P[SNP,, X SNP,,,,1)- (2) For SNPi,15 Tetiti,1) = 1-max(??[SNP,,, x SNP,,,,]). (3) 


For chromosome 3, Teri) = » Tetfti,j)» Where m= the total number of SNPs on 
j=l 
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achromosome. (4) Terp = >; Tefe(i)- Because this algorithm only accounts for pair- 


. = ; i=l. , 
wise LD, it provides a conservative estimate of the number of effective tests. 
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Gulisa Turashvili’, Richard Varhol®, René L. Warren®, Peter Watson’, Yongjun Zhao’, Carlos Caldas°, 
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Recent advances in next generation sequencing'* have made it 
possible to precisely characterize all somatic coding mutations that 
occur during the development and progression of individual can- 
cers. Here we used these approaches to sequence the genomes (>43- 
fold coverage) and transcriptomes of an oestrogen-receptor-a- 
positive metastatic lobular breast cancer at depth. We found 32 
somatic non-synonymous coding mutations present in the meta- 
stasis, and measured the frequency of these somatic mutations in 
DNA from the primary tumour of the same patient, which arose 
9 years earlier. Five of the 32 mutations (in ABCB11, HAUS3, 
SLC24A4, SNX4 and PALB2) were prevalent in the DNA of the 
primary tumour removed at diagnosis 9years earlier, six (in 
KIFIC, USP28, MYH8, MORC1, KIAA1468 and RNASEH2A) were 
present at lower frequencies (1-13%), 19 were not detected in the 
primary tumour, and two were undetermined. The combined ana- 
lysis of genome and transcriptome data revealed two new RNA- 
editing events that recode the amino acid sequence of SRP9 and 
COG3. Taken together, our data show that single nucleotide muta- 
tional heterogeneity can be a property of low or intermediate grade 
primary breast cancers and that significant evolution can occur 
with disease progression. 

Lobular breast cancer is an oestrogen-receptor-positive (ER*, also 
known as ESR1 ~) subtype of breast cancer (approximately 15% of all 
breast cancers). It is usually of low-intermediate histological grade 
and can recur many years after initial diagnosis. To interrogate the 
genomic landscape of this class of tumour, we re-sequenced'~* the 
DNA from a metastatic lobular breast cancer specimen (89% tumour 
cellularity; Supplementary Fig. 1) at approximately 43.1-fold aligned, 
haploid reference genome coverage (120.7 gigabases (Gb) aligned 
paired-end sequence; Supplementary Fig. 2, Table 1 and Supplemen- 
tary Methods). Deep high-throughput transcriptome sequencing 
(RNA-seq)° performed on the same sample generated 160.9-million 
reads that could be aligned (Supplementary Table 1, see also 
Supplementary Fig. 2 and Supplementary Methods). The saturation 
of the genome (Table 1) and RNA-seq (Supplementary Table 1) 
libraries for single nucleotide variant (SNV) detection is discussed 
in Supplementary Information. The aligned (hg18) reads were used 
to identify (Supplementary Fig. 2) the presence of genomic aberra- 
tions, including SNVs (Supplementary Table 2), insertions/deletions 
(indels), gene fusions, translocations, inversions and copy number 
alterations (Supplementary Methods). We examined predicted 


1,2,5 


coding indels and predicted inversions (coding or non-coding; 
Supplementary Methods); however, all of the events that were vali- 
dated by Sanger re-sequencing were also present in the germ line 
(Supplementary Tables 3 and 4). None of the 12 predicted gene 
fusions revalidated. We also computed the segmental copy number 
(Supplementary Methods and Supplementary Table 5a) from aligned 
reads, and revalidated high level amplicons by fluorescence in situ 
hybridization (FISH) (Supplementary Table 5b), revealing the pres- 
ence of a new low-level amplicon in the INSR locus (Supplementary 
Fig. 3). 

We identified coding SNVs from aligned reads, using a Binomial 
mixture model, SNVMix (Supplementary Table 2, Methods and 
Supplementary Appendix 1). From the RNA-seq (WTSS-PE) and 
genome (WGSS-PE) libraries we predicted 1,456 new coding non- 
synonymous SNVMix variants (Supplementary Table 2). After the 
removal of pseudogene and HLA sequences (1,178 positions remain- 
ing) and after primer design, we re-sequenced (Sanger amplicons) 
1,120 non-synonymous coding SNV positions in the tumour DNA 
and normal lymphocyte DNA. Some 437 positions (268 unique to 
WGSS-PE, 15 unique to WTSS-PE, and 154 in common) were con- 
firmed as non-synonymous coding variants. Of these, 405 were new 


Table 1| Summary of sequence library coverage 


WGSS-PE WTSS-PE 
Total number of reads 2,922,713,774 182,532,650 
Total nucleotides (Gb) 140.991 7.108 
Number of aligned reads 2,502,465,226 160,919,484 
Aligned nucleotides (Gb) 120.718 6.266 
Estimated error rate 0.021 0.013 
Estimated depth (non-gap 43.114 NA 
regions) 
Canonically aligned reads 2,294,067,534 109,093,616 


93.5 at >10 reads; 
95.7 at >5 reads 


82,200 at 10 reads (see also 
Supplementary Table 1) 


Exons covered 


Reads aligned canonically (%) 78.49 67.79 
Unaligned reads 420,248,548 21,613,166 
Mean read length (bp) 48.24 38.94 


The WGSS-PE column shows the genome paired-end read coverage for DNA from the 
metastatic pleural effusion sample. The WTSS-PE column shows coverage for the 
complementary DNA reads from the matched transcriptome libraries of the metastatic pleural 
effusion. Coverage of exon bases in the reference genome (hg18) is shown at 5 or more reads per 
position, and 10 or more reads per position for the metastatic genome. bp, base pairs; NA, not 
applicable. 


'Molecular Oncology, *Centre for Translational and Applied Genomics, *Michael Smith Genome Sciences Centre, BC Cancer Agency, 675 West 10th Avenue, Vancouver V5Z 1L3, 
Canada. *Medical Oncology, BC Cancer Agency, 600 West 10th Avenue, Vancouver V5Z 1L3, Canada. *Department of Pathology, University of British Columbia, G227-2211 
Wesbrook Mall, British Columbia, Vancouver V6T 2B5, Canada. “Cancer Research UK, Cambridge Research Institute, Li Ka Shing Centre, Robinson Way, Cambridge CB2 ORE, UK. 


7Deeley Research Centre, BC Cancer Agency, Victoria V8R 6V5, Canada. 
*These authors contributed equally to this work. 
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Table 2 | Somatic coding sequence SNVs validated by Sanger sequencing 
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Gene 


ABCB11 


HAUS3 
CDC6 


CHD3 


DLG4 
ERBB2 
FGA 
GOLGA4* 


GSTCD 


KIAA1468* 
KIF1C 
KLHL4 
MYH8 
PALB2 


PKDREJ 


RASEF 


RNASEH2A 
RNF220 
SPH 

USP28 


C11orf10 


THRSP 


SCEL 
SLC24A4 


COL1A1 


KIAA1772 
CCDC117 


RP1- 
32110.10 
MORC1 
SNX4 
LEPREL1 


WDRS9* 


Description 


Bile salt export pump 
(ATP-binding cassette 
sub-family B member 11) 
HAUS3 coiled-coil 
protein (C4orf15) 

Cell division control 
protein 6 homologue 
Chromodomain- 
helicase-DNA-binding 
protein 3 

Disks large 
homologue 4 
Receptor tyrosine- 
protein kinase erb-b2 
Fibrinogen alpha chain 


Golgin subfamily 

A member 4 

Glutathione S-transferase 
C-terminal domain- 
containing protein 

LisH domain and HEAT 
repeat-containing protein 
inesin-like protein 
KIFIC 

Kelch-like protein 4 


yosin 8 (myosin heavy 
chain 8) 

Partner and localizer 

of BRCA2 

Polycystic kidney 
disease and receptor for 
egg-jelly-related protein 
RAS and EF-hand 
domain-containing 
protein 

Ribonuclease H2 
subunit A (EC 3.1.26.4) 
RING finger protein 
Clorf164 

Transcription factor Sp1 


Ubiquitin carboxyl- 
terminal hydrolase 28 
UPF0197 
transmembrane 
protein C11lorf10 
Thyroid hormone- 
inducible hepatic protein 
Sciellin 
Na*/K*/Ca?*- 
exchange protein 4 
Collagen alpha-1(1) 
chain precursor 
GREB1-like protein 
Coiled-coil domain- 
containing protein 117 
Novel protein 


MORC family CW-type 
zinc finger protein 1 
Sorting nexin 4 


Prolyl 3-hydroxylase 2 
precursor (EC 1.14.11.7) 
WD repeat-containing 
protein 59 


Position 


2:169497197 


4:2203607 


7:35701114 


7:7751231 


7:7052251 


7:35133783 


4:155726802 


3:37267947 


4:106982671 


8:58076768 


7:4848025 


X:86659878 


7:10248420 


6:23559936 


22:45035285 


9:84867250 


9:12785252 


44650831 


2:52063157 


1:113185109 


1:61313958 


1:77452594 


3:77076497 
4:92018836 


7:45625043 


8:17278222 
22:27506951 


22:43140252 


3:110271286 


3:126721688 


3:191172415 


16:73500342 


Source 


WGSS 


WGSS, WTSS 
WGSS, WTSS 


WGSS 


WGSS 

WGSS, WTSS 
WGSS 
WGSS, WTSS 


WGSS, WTSS 


WGSS, WTSS 


WGSS, WTSS 
WGSS 
WGSS 
WGSS 


WGSS 


WTSS 


WGSS, WTSS 
WGSS 
WGSS 
WGSS, WTSS 


WGSS 


WGSS 


WGSS 
WGSS 


WGSS 


WGSS 
WGSS 


WGSS 
WGSS 
WGSS 
WGSS 


WTSS 


Allele 
change 


C>T 


C>T 
G>A 


G>A 


G>A 
C>G 
C>T 
GSC 


GSC 


GSC 
GSC 
C>T 
C>G 
T>G 


C>G 


G>A 


G>A 
G>A 
GSC 
C>T 


GSA 


C>T 


A>G 
GSA 


C>T 


A>G 
GSC 


G>C 
G>A 
C>T 
T>C 


C>T 


Amino 
acid 
change 


R>H 


V>M 
E>K 


E>K 


P>L 
I>M 
W>stop 
E>Q 


E>Q 


R>T 
K>N 
SSL 

M>| 
E>A 


E>Q 


S>L 


R>H 
D>N 
E>Q 
D>N 


T>I 


R>C 


K>R 
V>l 


G>D 


D>G 
K>N 


E>Q 
A>V 
D>N 
E>G 


M>l 


Protein domain 
affected 


Transmembrane 
helix 3 


Unknown 


-terminal, 
unknown 
Unknown, 
C-terminal 


Unknown, 
-terminal 
Kinase domain 


Fibrinogen a/b/c 
domain 
Unknown, 
N-terminal 
Unknown, 
C-terminal 


ARM type fold 


Kinesin motor 
domain 
Unknown, 
N-terminal 
Actin-interacting 
protein domain 
N-terminal 
prefolding 
Unknown 


F-hand Ca?*- 
inding motif 


om 


Unknown, 
C-terminal 
Unknown, 
-terminal 
Glu-rich 


Unknown 


Transmembrane 
domain 


Unknown 


Unknown 
Transmembrane 
domain 

Pro-rich domain 


Unknown 
Unknown 


Unknown 
Coiled-coil 


Unknown, 
N-terminal 
Hydroxylase 
domain 
Unknown, 
C-terminal 


-terminal domain 


Expression 
(sequenced 
bases per 
exonic base) 


0.3 


14.1 


lal 


17.3 


Allelic 
expression 
bias (R, NR 
allele) 


dd. 


4, 23 
3h3 


41,11 
(Q<0.01) 


7,1 
62,35 
NA 
37,12 


23,8 


23;,11 
16, 13 
1,0 
NA 
NA 


NA 


3-2 


2,2 

NA 

40, 10 
(Q<0.01) 
che, 


13;3 


6,5 


Copy number 
classification 
(HMM state) 


Amplification 


(4) 

Neutral (2) 
Amplification 
(4) 

Neutral (2) 
Neutral (2) 
Amplification 
(4) 

Gain (3) 
Gain (3) 


eutral (2) 


eutral (2) 
Neutral (2) 


eutral (2) 


eutral (2) 
Amplification 


(4) 
Gain (3) 


Gain (3) 


Neutral (2) 
Neutral (2) 
Amplification 
(4) 

Gain (3) 


Amplification 


(4) 

Gain (3) 
Gain (3) 
Amplification 
(4) 
Amplification 
(4) 

Neutral (2) 
Neutral (2) 
Gain (3) 
Gain (3) 
Gain (3) 
Gain (3) 


Neutral (2) 


Omnibus table showing the features associated with the 32 Sanger amplicon-validated non-synonymous somatic mutations from the WGSS-PE and WTSS-PE libraries. Mutation positions are on the 
basis of reference genome hg18. The nucleotide substitutions are shown as reference>variant. The amino acid change is shown as reference>variant amino acid. If the mutation occurs in a 

recognized protein domain or motif this is shown. The transcript expression level in WTSS-PE reads is shown as the mean number of reads supporting each position in the transcript. The allelic 
expression bias column shows the number of reference (R), non-reference (NR) reads in the WTSS-PE library at the mutated position. Three transcripts (CHD3, SP1 and COL1A71) show significant 
expression bias (annotated with Q < 0.01, Supplementary Methods) in favour of the reference allele; however, none of the heterozygous somatic mutations were biased in favour of the non- 

reference allele. The expression of HAUS3 is predominantly non-reference as expected for a homozygous allele. The HMM state classifier of copy number for the genomic region encompassing each 
mutation position is shown in the last column, as state (state number). C-terminal, carboxy-terminal; N-terminal, amino-terminal. 
*Genes showing alternative splicing. 
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germline alleles and 32 were revealed as non-synonymous coding 
somatic point mutations (Table 2). Of the 32 somatic mutations, 
30 were present in WGSS-PE and/or WTSS-PE, whereas two were 
detected from the WTSS library sequence alone (Table 2). None of 
the 32 genes were found in common with the CAN breast genes®, 
which were discovered from ER cell lines. Eleven genes appear in 
the current release of COSMIC’ (CHD3, SP1, PALB2, ERBB2, USP28, 
KLHL4, CDC6, KIAA1468, RNF220, COL1A1 and SNX4) but with 
mutations at different positions. We examined the population 
frequency of the somatic mutation positions for PALB2, ERBB2, 
USP28, CDC6, CHD3, HAUS3 (previously known as C4orf15), SP1, 
KIAA1468 and DLG4 in a further 192 breast cancers (Supplemen- 
tary Methods; 112 lobular, 80 ductal). None of these 192 breast 
cancers showed identical mutations to those described here; however, 
3 out of 192 cases (2 lobular, 1 ductal) contained neighbouring non- 
synonymous variants/deletions affecting the ERBB2 kinase domain 
(Supplementary Fig. 4). Interestingly, 2 out of 192 cases (both lobu- 
lar) contained two different heterozygous truncating variants in 
HAUS3: chr4:2203685 G>T on minus strand, GAG>TAG 
(Glu>stop), and chr4:2203483 C>G on minus strand, TCA>TGA 
(Ser>stop) (Supplementary Fig. 5). Notably, HAUS3 is a member of 
the recently described*"° multiprotein augmin complex, the function 
of which is required for genome stability mediated by appropriate 
kinetochore attachment and centrosome morphogenesis. 

To determine how many of the somatic non-synonymous coding 
sequence mutations were already present at diagnosis 9 years earlier, 
we next examined genomic DNA from the primary tumour directly, 
by a single molecule frequency counting experiment (Supplementary 
Methods)*. Twenty-eight of the 32 mutations yielded amplicons 
compatible with Illumina sequencing (Supplementary Methods), 
and two extra mutations were sampled by Sanger sequencing 


LETTERS 


(Supplementary Fig. 5). As controls we selected 36 heterozygous 
germline SNVs at random. The PCR amplicons for known germline 
and somatic mutations were sequenced on an Illumina device. After 
alignment, the observed counts of reference and non-reference bases 
at the target position were compared using the Binomial exact test. 
To calibrate the expected mean of the Binomial distribution, we used 
the non-reference allele frequency from positions —5 to +5 sur- 
rounding (but not including) the target position (Supplementary 
Table 6a, b), where only reference bases should be called. Unequal 
segmental amplification/deletion in the gnome may contribute to a 
departure from the theoretical ratio of 0.5 for a heterozygous allele. 
As a result, amplicons from heterozygous germline alleles showed 
occasional measured frequencies of between 0.2 and 0.8 in both the 
primary and metastatic tumour DNA (Table 3 and Supplementary 
Table 7), but with a modal frequency around 0.5, as expected. In the 
metastatic genomic DNA the somatic mutations showed frequencies 
of between 0.2 and 0.79 (Table 3). Notably, the somatic coding 
mutation positions examined in the primary tumour showed three 
patterns of abundance: prevalent, rare and undetectable (Table 3). 
Mutations in ABCB11, PALB2 and SLC24A4 were detected at preval- 
ent frequencies for heterozygous mutations (=0.2, the lowest value 
seen for known germline alleles) given a 73% tumour content. The 
frequency of the mutation in HAUS3 was 0.79, consistent with it 
being a prevalent homozygous mutation, also confirmed by Sanger 
sequencing (Supplementary Fig. 5). Sanger amplicon sequencing 
showed that the SNX4 somatic mutation was also present in the 
primary tumour, whereas the KIAA1772 (also known as GREBIL) 
mutation was not. Six mutations (KIFIC, USP28, MORCI, MYH8, 
KIAA1468 and RNASEH2A) showed statistically significant 
(P<0.01, Binomial exact test) intermediate frequencies of between 
1% and 13% (Table 3), suggesting that these mutations were 


Table 3 | Frequency of germline and somatic alleles in the metastatic and primary genomes 


Dominant 762 


Position Locus R NR Primary Primary Primary 
depth NR ratio Pvalue 

4:2203607 HAUS3 Cc 7 5700 0.5472 0.0000 
16:23559936 PALB2 T G 115 0.4957 0.0000 
2:169497197 ABCB11 Cc T 506 0.3261 0.0000 
14:92018836 SLC24A4 G A 13347 0.2341 0.0000 
17:10248420 MYH8 Cc G 10657 0.1353 0.0000 
3:110271286 MORC1 G A 24572 0.0468 0.0000 
17:4848025 KIF1C G A 8587 0.0107 0.0000 
11:113185109 USP28 Cc T 6654 0.0095 0.0000 
18:58076768 KIAA1468 G A 719 0.0083 0.0020 
19:12785252 RNASEH2A G A 6537 0.0029 0.0276 
4:106982671 GSTCD G T 7273 0.0008 0.9885 
17:35701114 CDC6 G T 4894 0.0008 0.9733 
17:7751231 CHD3 G A 9665 0.0007 0.9981 
4:155726802 FGA Cc T 5756 0.0007 0.9911 

7:7052251 DLG4 G A 4383 0.0007 0.9835 
3:37267947 GOLGA4 G T 13051 0.0006 0.9999 
9:84867250 RASEF G T 1690 0.0006 0.9500 

7:35133783 ERBB2 Cc A 3736 0.0005 0.9899 
X:86659878 KLHL4 Cc it 6561 0.0005 0.9993 
3:191172415 LPREL1 19 C 1963 0.0004 1.0000 

6:73500342 WDRS9 iC ii 4846 0.0004 0.9982 

‘44650831 RNF220 G A 8160 0.0004 0.9999 
22:45035285 PKDREJ Cc t 6674 0.0003 0.9999 

1:61313958 C110RF10 G A 16705 0.0003 .0000 
12:52063157 SP1 G T 7732 0.0003 .0000 
11:77452594 THRSP Cc T 24219 0.0002 .0000 
17:45625043 COL1IA1 Cc A 26343 0.0001 .0000 
13:77076497 SCEL A G 49 0.0000 .0000 
19:9314428 = A G 76 0.5057 0.0000 
4:130144460 = A T 2020 0.2188 0.0000 
8:27835012 = G A 3587 0.8602 0.0000 
6:32908543 = Cc T 4718 0.7484 0.0000 
20:43363061 = G A 5950 0.5249 0.0000 
4:8672089 = G A 381 1.0000 0.0000 
16:1331138 = Cc T 677 0.4963 0.0000 


Primary status Metastasis Metastasis M Copy number 
depth NR ratio classification (HMM state) 
0.7874 S eutral (2) 
Dominant 669 0.4350 S Amplification (4) 
Dominant 959 0.3691 $ Amplification (4) 
Dominant 13670 0.3518 S Amplification (4) 
Subdominant 1797 0.5932 S eutral (2) 
Subdominant 32273 0.4107 S Gain (3) 
Subdominant 2272 0.3077 S eutral (2) 
Subdominant 1387 0.4484 S Gain (3) 
Subdominant 056 0.3059 S eutral (2) 
Subdominant 497 0.2806 S eutral (2) 
Absen 2208 0.2174 S eutral (2) 
Absen 4208 0.3577 S Amplification (4) 
Absen 737 0.2671 S eutral (2) 
Absen 2287 0.2755 S Gain (3) 
Absen 706 0.3272 S eutral (2) 
Absen 3262 0.2235 S Gain (3) 
Absen 796 0.3656 S Gain (3) 
Absen 1722 0.3612 S Amplification (4) 
Absen 977 0.3153 S Neutral (2) 
Absen 8381 0.2148 5 Gain (3) 
Absen 1396 0.2629 S Neutral (2) 
Absen 967 0.2203 5 eutral (2) 
Absen 1230 0.3366 S Gain (3) 
Absen 14354 0.4651 S Amplification (4) 
Absen 2011 0.2193 s Amplification (4) 
Absen 40652 0.4750 5 Gain (3) 
Absent 32259 0.2543 S Amplification (4) 
Absen 187 0.5722 S Gain (3) 
Present 32 0.4953 G Neutral (2) 
Present 2081 0.3099 G Neutral (2) 
Present 10781 0.6667 G Deletion (1) 
Present 16370 0.4897 G Amplification (4) 
Present 5540 0.5049 G Amplification (4) 
Present 2850 0.8032 G Gain (3) 
Present 554 0.6245 G High-level amplicon (5) 


Only 7 germline alleles are shown, the full list is in Supplementary Table 7. The genome positions are shown as chr:coordinate. The primary read depth represents the number of reads. Binomial exact 
P values were calculated using a Binomial exact test. R, reference base; NR, non-reference base. Primary status indicates whether the variant was present, subdominant or absent in the primary 
tumour. Column M denotes somatic (S) or germline (G) single nucleotide variants in the metastasis. HMM state refers to the metastasis. 
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Tumour RNA 


Tumour DNA 


COG3 SRP9 


Figure 1| RNA editing in COG3 and SRP9. Sanger sequence traces from the 
non-synonymous editing positions in COG3 and SRP9. The editing position 
is arrowed. Top trace is tumour RNA, bottom trace tumour DNA. The 
editing positions were confirmed with reverse strand reads (not shown). 


restricted to minor subclones of tumour cells. The remaining 19 out 
of 30 of the somatic coding mutations were not detected in the 
primary tumour DNA. Thus, significant heterogeneity in tumour 
somatic mutation content existed in the primary tumour at dia- 
gnosis. In contrast with the recently reported sequence of cytogen- 
etically normal acute myeloid leukaemia (AML) tumour’, significant 
evolution of coding mutational content occurred between primary 
and metastasis. It is unknown whether the 19 mutations present in 
the metastasis, but not detected in the primary, were a consequence of 
radiation therapy or innate tumour progression. 

We also examined how the transfer of information from the nuc- 
lear genome to proteins was modified by alternative splicing 
(Supplementary Table 8 and Supplementary Fig. 6), biased allelic 
expression (Supplementary Table 9) and RNA editing. At the single 
nucleotide level, RNA-editing enzymes (which can be regulated by 
oestrogens'') may also recode transcripts resulting in a proteome 
divergent from the genome’*". Interestingly, the ADAR enzyme— 
one of the principal RNA-editing enzymes that mediates A—I(G) 
edits—was one of the top 5% of genes expressed (145.6 reads per 
base, Supplementary Table 10), and the only editing enzyme 
expressed at a high level. We searched for potential editing events 
(Methods) and found 3,122 candidate edits in 1,637 gene loci 
(Supplementary Table 11). Some 526 out of 3,122 candidate edits 
are non-synonymous changes and 232 are synonymous changes 
(with the remainder affecting untranslated regions). We revalidated 
independently (Supplementary Methods) by Sanger sequencing 75 
editing events in 12 gene loci from the lobular metastasis 
(Supplementary Table 12 and see trace data at http:// 
molonc.bccre.ca/). Two genes, COG3 and SRP9 (Fig. 1), showed 
confirmed high frequency non-synonymous transcript editing, 
resulting in variant protein sequences. These observations emphasize 
the importance of integrating RNA-seq data with tumour genomes in 
assessing protein variation. 

The coding mutation landscape of breast cancers has, so far, been 
mostly determined from ER metastatic cell lines/samples®'’, and 
has suggested the presence of large numbers of passenger events as 
well as drivers. Our results show the importance of sequencing sam- 
ples of tumour cell populations early as well as late in the evolution of 
tumours, and of estimating allele frequency in tumour genomes. Our 
observations suggest that the sequencing of primary breast cancers 
and pre-invasive malignancy may reveal significantly fewer candi- 
dates for tumour initiating mutations. 


METHODS SUMMARY 


Paired-end reads were assigned quality scores and aligned to the reference gen- 
ome (hg18) using Maq”’ (Supplementary Methods and Supplementary Fig. 2). 
For identification of SNVs we used a simple Binomial mixture model, SNVMix 
(Supplementary Appendix 1), which assigns a probability to each base position 
as homozygous reference (aa), heterozygous non-reference (ab) and homo- 
zygous non-reference (bb), based on the occurrence of reference (hg18) and 
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non-reference bases at each aligned position. This model was calibrated initially, 
using high confidence allele calls from Affymetrix SNP6.0 hybridization of 
tumour and normal DNA. We estimated the receiver operating characteristic 
(ROC) performance (Supplementary Fig. 8) and determined that an SNVMix 
threshold of P = 0.77 for (ab) or (bb) for a non-reference call would yield a false 
discovery rate (FDR) of 1%. For the RNA-seq library, a threshold of P = 0.53 was 
used (Supplementary Fig. 8; FDR = 0.01) to call non-reference positions. Non- 
reference positions were then filtered for known variants against the sources of 
germline variation, the single nucleotide polymorphism database (dbSNP) and 
the completed individual genomes'*”’ (Supplementary Table 2). Saturation of 
the libraries for SNV discovery was determined by random re-sampling 
(Supplementary Fig. 9 and Supplementary Methods). Segmental copy number 
was inferred with a hidden Markov model (HMM) method (Supplementary 
Table 4a, b and Supplementary Methods). 

We searched for RNA-editing events by examining all very high confidence 
(P(ab) + P(bb) > 0.9) SNVMix predictions from the RNA-seq library of the 
metastatic tumour, that were not found with extreme confidence (P(aa) > 0.99, 
derived from the SNVMix receiver operating curve at FDR = 0.01) at the same 
positions in the metastatic tumour genome library. 


Received 4 September; accepted 10 September 2009. 
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John F. Thompson’, Jayson Bowers’, Mirna Jarosz’ & Patrice M. Milos’ 


Our understanding of human biology and disease is ultimately 
dependent on a complete understanding of the genome and its 
functions. The recent application of microarray and sequencing 
technologies to transcriptomics has changed the simplistic view 
of transcriptomes to a more complicated view of genome-wide 
transcription where a large fraction of transcripts emanates from 
unannotated parts of genomes’’, and underlined our limited 
knowledge of the dynamic state of transcription. Most of this broad 
body of knowledge was obtained indirectly because current tran- 
scriptome analysis methods typically require RNA to be converted 
to complementary DNA (cDNA) before measurements, even 
though the cDNA synthesis step introduces multiple biases and 
artefacts that interfere with both the proper characterization and 
quantification of transcripts*'*. Furthermore, cDNA synthesis is 
not particularly suitable for the analysis of short, degraded and/or 
small quantity RNA samples. Here we report direct single molecule 
RNA sequencing without prior conversion of RNA to cDNA. We 
applied this technology to sequence femtomole quantities of 
poly(A)* Saccharomyces cerevisiae RNA using a surface coated with 
poly(dT) oligonucleotides to capture the RNAs at their natural 
poly(A) tails and initiate sequencing by synthesis. We observed 
transcript 3’ end heterogeneity and polyadenylated small nucleolar 


RNAs. This study provides a path to high-throughput and low-cost 
direct RNA sequencing and achieving the ultimate goal of a com- 
prehensive and bias-free understanding of transcriptomes. 
cDNA-based transcriptome analysis approaches being used today 
exhibit several shortcomings that prevent us from understanding the 
real nature of transcriptomes and ultimately genome biology. Some of 
these limitations are: (1) the tendency of various reverse transcriptases 
(RT) to generate spurious second-strand cDNA due to their DNA- 
dependent DNA polymerase activities”'®'’; (2) the generation of 
artefactual cDNAs due to template switching*'*’®'” or contaminating 
DNA and primer-independent cDNA synthesis’’”’; and (3) the error- 
prone’? and inefficient nature of RTs yielding low quantities of 
cDNA. Furthermore, most RNA analysis technologies require the 
synthesis of not just the first strand cDNA but also a second strand 
cDNA that are both subjected to further ligation/amplification steps, 
introducing yet more biases. These limitations pose problems for the 
determination of RNA strandedness!*”’, the identification of chimae- 
ric transcripts, quantification of RNA species, and the analysis of low 
quantity (<1 nanogram) or short RNA species, such as those obtained 
from formalin-fixed, paraffin-embedded tissue samples. Because 
almost all transcript analysis technologies in use today suffer from 
the limitations briefly summarized above, there is an ever-growing 
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Figure 1| DRS chemistry and sequencing steps. a, Under optimized 
conditions, polymerase exhibits fast correct nucleotide incorporation (VT- 
A) and slow misincorporation (VT-G) kinetics. b, DRS procedure. Top left: 
polyadenylated and 3’-blocked RNA is captured on surfaces coated with 
dT(50) oligonucleotide. A ‘fill step is performed with natural dTTP, and a 
‘lock’ step with fluorescently labelled VT-A, -C and -G nucleotides. These 
steps correct for any misalignments that may be present in poly(A/T) 
duplexes, and ensure that the sequencing starts in the template rather than 


the poly(A) tail. Imaging is performed to locate the template positions. 
Bottom left: chemical cleavage of the dye-nucleotide linker is performed to 
prepare the templates for nucleotide incorporation. Top right: incubation 
with one VT nucleotide and polymerase is performed, followed by imaging 
to locate the templates that incorporated the nucleotide. Bottom right: 
chemical cleavage of the dye allows the surface and RNA templates to be 
ready for the next nucleotide addition cycle. 
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Figure 2 | DRS sequencing read-length statistics. a, Cumulative length 
distribution of reads obtained from oligoribonucleotides. y axis shows the 
fraction of reads at and above particular x-read lengths. b, Distribution of 


need for a method that would not be subject to the difficulties asso- 
ciated with RT behaviour, amplification, ligation and other cDNA 
synthesis/sample manipulation steps. A method allowing a compre- 
hensive and bias-free view of transcriptomes using minute quantities 
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read lengths greater than 20 nucleotides aligned to S. cerevisiae genome. 
c, Several DRS reads are aligned with BLAT and visualized using the UCSC 
genome browser. 


of total RNA obtained from as few as one cell with no pre-treatment 
would stimulate great advances in the delineation of complex 


biological processes and be applicable across all biomedical research 
areas. 
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Figure 3 | DRS read distribution. a, b, Distance of aligned reads to 

S. cerevisiae coding sequence (a) and EST 3’ ends (b). y axis shows the 
fraction of reads at particular distance intervals indicated on the x axis. Most 
reads (91%) are 300 nucleotides immediately downstream of annotated gene 
3’ ends. Reads aligning within the coding regions of transcripts are shown 


with negative distance in a. b represents the distances between the DRS reads 
and the closest EST clones. c, DRS reads aligning to TDH3 and RPS3 are 
exemplified. The alignment direction of each read (black bars) is indicated as 
‘positive’ or ‘negative’. 
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Here we report the successful development of direct RNA sequencing 
(DRS), allowing massively parallel sequencing of RNA molecules 
directly without prior synthesis of cDNA or the need for ligation/ 
amplification steps. DRS represents an extension of single-molecule 
DNA sequencing technology (tSMS)*’” that relies on the stepwise 
synthesis and direct imaging of billions of single DNA strands on a 
planar surface. The sequencing-by-synthesis reaction is performed 
using a modified polymerase and proprietary fluorescent nucleotide 
analogues, called Virtual Terminator nucleotides (VT), that contain a 
fluorescent dye and chemically cleavable groups that allow step-wise 
sequencing. The first step for the development of DRS was the iden- 
tification of an optimal polymerase, VT nucleotide analogues and 
buffer combination. Several DNA-dependent DNA polymerases have 
previously been shown to have reverse transcriptase activity**”; we 
therefore tested DNA polymerases in addition to known RTs. After 
screening studies performed in solution, we identified conditions with 
satisfactory reaction kinetics (Fig. la) that could be attempted in a 
single-molecule sequencing system. The DRS procedure is summarized 
in Fig. 1b. Briefly, Escherichia coli poly(A) polymerase I (PAPI) is used 
to generate an A tail on 3’ ends of RNA molecules. The control of the 
A-tail length and the 3’ end blocking is performed by introducing 
3'deoxyATP to the polyadenylation reaction shortly after the start of 
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the tailing reaction, generating an A-tail of ~150 nucleotides. The 
blocking step is performed to prevent ‘downward’ nucleotide additions 
to the 3’ end of the template during the sequencing process. For RNA 
species containing poly(A) tails, such as mRNAs, poly(A) tailing is not 
required; only 3’ blocking is needed. Polyadenylated and 3'-blocked 
RNAs are hybridized to poly(dT)-coated surfaces. To begin sequencing 
at the unique region adjacent to the poly(A) tail, each RNA molecule is 
‘fille?’ in with dTTP and polymerase, and then ‘locked’ in position with 
VT-A, -C and -G addition, stopping subsequent nucleotide additions. 
After washing away the unincorporated dye-labelled nucleotides, 
images are taken, and then the fluorescent dye and inhibitor are cleaved 
off the incorporated nucleotide, rendering it suitable for additional 
rounds of incorporation. Each molecule is then provided the oppor- 
tunity to extend (alternating C, T, A or G) followed by rinsing, imaging 
and cleavage. Repeating this cycle many times provides a set of images 
that are aligned and then used to generate sequence information for 
each individual RNA molecule with real-time image processing. 

We first used chemically synthesized 40-mer RNA oligoribonucleo- 
tides as a model system to develop and optimize DRS chemistry. After 
sequencing on a prototype sequencer with 120 cycles of alternating 
VT-C, -T, -A or -G additions, we aligned the resultant sequence reads 
to the input oligonucleotide reference sequences and observed 48.5% 
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Figure 4 | S. cerevisiae poly(A)* RNA DRS suggests overlapping 
transcription units and polyadenylated snoRNA and rRNA species. a, DRS 
reads in this region are aligned in the reverse direction, suggesting their 
origination from the HHF2 transcript (not shown) located ~200 nucleotide 
upstream relative to annotated KTR5 3’ end and transcribed in the forward 
direction. Three reads extend into the KTR5 coding sequence, suggesting 


816 


(REXEIELELEL ULES STEEL ELE LELE TEETER EE EE EE ETL EEE LET EL LEE EEE L ET ESSELTE E EEL L EEL EEE EE ETE EEE LETTE EEE LETTE EEE ET IEEEIEAET ES 


that the HHF2 transcription can extend into KTR5 coding sequence. 

b, Figure exemplifies randomly selected reads aligning to intronic SNR18 
(top panel) and 5.8S rRNA (bottom panel). Note that the reads are aligning 
within the 3’ ~40-nucleotide annotated regions of snoRNAs and rRNAs, 
suggesting their polyadenylation during or after their maturation. 
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of aligned reads to have a sequence length of at least 20 nucleotides, 
with the longest perfect match (no errors) being 38 nucleotides (Fig. 2a 
and Supplementary Fig. 5). In terms of total strand yield, DRS was 
efficient, providing on average 972 reads per 1,000 1m” flow cell 
surface area compared to ~ 1,100 for tSMS DNA sequencing in similar 
conditions. Because the sequencing process relies on incubating the 
templates with one base at a time, all errors are single base errors in the 
form of deletions (failure to detect incorporation), insertions (for 
example, failure to rinse VT analogues from the flow cell between 
each addition cycle) and substitutions. Total raw base error rate for 
DRS is currently approximately 4%, dominated by missing base errors 
(2-3%), whereas the insertion rate is 1-2% and the substitution error 
rate is 0.1-0.3%. Although further improvements in error rates are in 
progress, the read lengths and error rates achieved here are sufficient 
to allow the use of standard computational methods to align 
sequences to reference transcriptomes and genomes. 

We then sequenced Saccharomyces cerevisiae poly(A)* RNA with 
DRS. Because this RNA sample already contains a natural, pre-existing 
poly(A) tail, no additional tailing was needed. Two femtomoles (~2 
nanograms) of 3’ end-blocked yeast poly(A) * RNA was hybridized to 
dT(50) flow cells with no additional sample preparation procedures. 
One hundred and twenty sequencing cycles were performed on a 
prototype sequencing system over 3 days. RNA stability remained at 
high levels during the run, as demonstrated by the relatively constant 
number of nucleotides added per addition cycle to RNA templates 
(Supplementary Fig. 2). The sequence run generated 41,261 reads 
greater than 20 nucleotides, of which 19,501 reads (48.4%) aligned 
to the yeast genome using the BLAT algorithm”®. The average aligned 
read length was 28.7 nucleotides, with the longest perfect match 
aligned read being 50 nucleotides (Fig. 2b, c). Of the aligned reads 
91% were within 400 nucleotides downstream of annotated yeast 
gene 3’ open reading frame sequence ends (Fig. 3a). Such a wide 
distribution is expected, as yeast 3’ gene annotations mark mostly 
the coding sequence end point rather than the polyadenylation site. 
As expected, the alignment orientation of these sequences was in the 
opposite direction to the known gene transcription direction. This is a 
result of using unmodified, intact yeast poly(A)” RNA without any 
additional sample preparation steps, and therefore, the sequence read 
matches the direction opposite to that of the transcript. Because the 3’ 
ends of yeast protein-coding genes are not well annotated, we com- 
pared our findings to the yeast expressed sequence tags (EST) database 
as well. As exemplified in Fig. 3b and c, our data are supported by the 
EST data, with most of our reads being in close proximity to EST 3’ 
ends, aligned in the direction opposite to transcription. The reads that 
did not align to proximal 3’ ends of yeast genes or EST clones were 
caused by their localization beyond the 500 nucleotide downstream 
regions examined, at the 3’ ends of potentially transcriptionally active 
retrotransposons or at the 3’ ends of transcripts classified as dubious 
(Supplementary Fig. 3). Comparison of the DRS read localizations to 
the transcript 3’ ends identified previously’ using high-throughput 
cDNA sequencing revealed a high concordance, with 81% of the DRS 
reads being within +20 nucleotides of their 3’-end annotations 
(Supplementary Fig. 6). We observed transcripts extending into the 
coding sequence of neighbouring genes (Fig. 4a), some of which were 
supported by the available EST data and as described’. Interestingly, 
~2% of the total reads aligning within the coding regions of tran- 
scripts (Fig. 3a) were from the ribosomal RNAs (rRNAs) anda portion 
of small nucleolar RNAs (snoRNAs). Mature forms of these RNAs are 
produced from longer precursor RNAs through cleavage steps. Our 
observation that DRS reads map to the 3’ end of the mature snoRNAs 
and rRNAs indicates that at least a fraction of snoRNAs*””* and 
rRNAs” can be polyadenylated post-transcriptionally, possibly 
during their 3’-end processing and/or RNA quality control steps” 
(Fig. 4b and Supplementary Fig. 7). We independently validated the 
3’ polyadenylation site heterogeneity and the existence of polyadeny- 
lated snoRNAs by amplifying 3’ polyadenylation sites with poly- 
merase chain reaction (PCR) in a manner preserving the variability 
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in the 3’ ends, followed by sequencing of the PCR products with tsMS 
DNA sequencing to identify the 3’ polyadenylation sites (Supplemen- 
tary Fig. 11). Our data add further support to the suggestion that many 
yeast genes possess a heterogeneous set of 3’ ends for genes’. 

The simplicity of the DRS sample preparation steps presented here, 
the requirement for only femtomole quantities of RNA and the poten- 
tial of DRS to eliminate biases introduced by cDNA synthesis, end 
repair, ligation and amplification procedures will be useful for appli- 
cations requiring minute quantities of RNA and/or short RNA species 
that are challenging for analysis with existing cDNA-based methodo- 
logies. This ability, combined with further improvements in DRS 
sample preparation, single molecule sequencing surface capture, 
throughput and computational tools, will ultimately allow us to 
understand and quantify the ‘true’ nature of transcriptomes in a 
high-throughput, low-cost and bias-free manner. 


METHODS SUMMARY 


RNA oligoribonucleotide templates were obtained from IDT. S. cerevisiae 
poly(A)* RNA was obtained from Clontech. Polyadenylation of oligoribonu- 
cleotides was performed by using a poly(A) tailing kit (Ambion). 3’deoxyATP 
(cordycepin triphosphate, Jena Biosciences) was introduced 10 min after the 
initiation of the polyadenylation reaction for 3’-end blocking and tail length 
limitation. DRS reads obtained are listed in Supplementary Table 4. 


Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 
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METHODS 

Polymerase kinetics assay. Incorporations of VT nucleotide analogues in an 
RNA-template-directed manner by various enzyme and buffer combinations 
were screened by designing four 50-mer oligoribonucleotides (5’-UUCUUUU 
GCCUCUUUCGNCAGGGCAGAGGAUGGAUGCAAGGAUAAGUGGA-3’); 
the 5’ 25-nucleotide sequence of the oligoribonucleotides being complementary 
to a 25-mer 5’-rhodamine-labelled DNA oligo (5’-TCCACTTATCCTTGCAT 
CCATCCTCTGCCCTG-3’), and the 26th nucleotide (denoted as N above) on 
the oligoribonucleotides being each of the four nucleotides. After hybridizing the 
25-mer DNA oligonucleotide to RNA templates at 65 °C for 5 min in nuclease- 
free water, followed by incubation on ice for 2 min, the selected enzyme/buffer/VT 
combinations were added to the RNA-DNA hybrid mix. The reaction was 
stopped at different time points by an EDTA-quench and kinetics were measured 
by observing the lengthening of the 5’-rhodamine-labelled DNA oligonucleotide 
using capillary electrophoresis (ABI 3730 DNA Analyzer, Applied Biosystems). 
This assay allowed us to observe the kinetics of VT nucleotide incorporation into 
the 3’ end of the DNA primer in an RNA-template-directed manner. All oligo- 
nucleotides and oligoribonucleotides were ordered from IDT. 

Sample preparation for DRS. RNA oligoribonucleotide templates were ordered 
from IDT. The sequences were Oligo 1, 5'-AGAGUCCCAUCCUCACCAUCAU 
CACACUGGAAGACUGCAG-3’; Oligo 2, 5’-CUGGUGCAGCACUCUCGAC 
GGCACCUAUCUGCCAUCGUAG-3’; Oligo 3, 5’-CGAUCGUCACUAUCUG 
CAUCAGUAGCUCUAGCAUACUGAG-3’; Oligo 4, 5’-UCUUUCGUCAGGG 
CAGAGGAUGGAUGCAAGGAUAAGUGGA-3’. Polyadenylation was performed 
by using a poly(A) tailing kit (Ambion). 3’deoxyATP (cordycepin triphosphate, 
Jena Biosciences) was introduced 10 min after the initiation of the polyadenylation 
reaction for 3'-end blocking and tail length limitation. Reaction products were 
cleaned by phenol-chloroform extraction and ethanol precipitation. Samples were 
analysed with microcapillary electrophoresis (Agilent Technologies) (Supplemen- 
tary Fig. 1). Poly(A)* S. cerevisiae RNA strain DBY746 (his3D1 leu2-3 leu2-112 
ura3-52 trp1-289), grown under standard conditions (yeast peptone dextrose, 
30°C) was obtained from Clontech (product number 636312). S. cerevisiae 
poly(A)* RNA (2ng) was used for 3’-end blocking reaction with poly(A) poly- 
merase and 3’deoxyATP. 

Surfaces and template capture. Fifty-nucleotide poly(dT) primers (obtained 
from IDT) were covalently coupled to sequencing surfaces prepared on glass 
coverslips in one-channel or five-channel formats. Slides are available from 
Helicos BioSciences. Poly(A) tail containing RNA molecules were hybridized 
to the surface at 10-30 pM, requiring 0.5-1.5 fmol polyadenylated RNA per 
sequencing reaction. The surface was rinsed and the locations of the hybridized 
templates were determined by imaging after the ‘fill and lock’ step. 

cDNA preparation, PCR amplification and DNA sequencing with the Helicos 
Genetic Analysis System. First-strand cDNA was prepared using a SuperScript 
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Ill first-strand cDNA synthesis kit (Invitrogen) from 500ng poly(A)* 
S. cerevisiae RNA according to manufacturer’s instructions using 50 pmol dT/ 
U-25-V primer (Supplementary Table 1). After cDNA synthesis, RNA was 
removed by RNase H (Invitrogen) and RNA If (New England Biolabs) digestion 
for 30 min at 37 °C followed by cleaning with a Nucleotide Removal Kit (Qiagen, 
28304). The cDNA is then PCR-amplified with the dT/U-25-V primer and gene/ 
snoRNA specific primers (Supplementary Table 1) using Taq polymerase (New 
England Biolabs, M0273) under the following thermal cycling conditions: 94 °C 
for 3 min, 30 cycles of 94°C for 30s, 48 °C for 30s, 72 °C for 30, followed by a 
final 72 °C 10 min incubation step. The excess primers were removed by running 
the PCR products on 1% agarose gel. Because 3’ ends of genes are amplified, and 
multiple and variable size PCR products are expected, we extracted regions from 
the gels representing 50-500 base pairs (bp) size distribution (visible PCR 
products had 100-300 bp sizes) and isolated the DNA with a QIAEX II gel 
extraction kit (Qiagen, 20021). We chose this approach over commercial 
column/bead-based cleaning methods because: (1) PCR primers need to be 
removed as much as possible, otherwise they will be A-tailed by the terminal 
transferase (described below) and sequenced; and (2) as we expected multiple 
and variably sized PCR products, we did not want to use commercial systems 
that may have varying efficiencies of removing small fragments (generally 
<100 bp) and preserving larger DNA fragments. The PCR products were then 
treated with the USER enzyme (New England Biolabs, M5505) to eliminate/ 
reduce the 5’-T/U tails left on the PCR products after PCR (Supplementary 
Fig. 11A). This step was performed to prevent potential competition of the 
5'-T/U tail with the Poly(dT) primers on sequencing surfaces used for template 
capture and sequencing initiation. The USER reaction was cleaned with the 
Qiagen Nucleotide Removal Kit. Ten nanograms from each PCR product was 
combined and A-tailed with terminal transferase (New England Biolabs). Briefly, 
pooled PCR products were heat denatured at 95 °C for 5 min in the presence of 
the supplied 1X reaction buffer and 2.5mM CoCl,, followed by rapid snap- 
cooling on ice. Terminal transferase (40 U; New England Biolabs) and 900 pmol 
dATP were then added to the denatured DNA in 50 ul final reaction volume, 
incubated at 37 °C for 1 h, followed by the inactivation of the enzyme at 70 °C for 
10 min. The blocking step was performed by adding 300 pmol ddTTP and 4 U of 
terminal transferase to the heat-denatured A-tailed reaction above, incubating at 
37°C for 1h, followed by the inactivation of the enzyme at 70°C for 20 min. 
After the tailing and blocking steps, the final DNA was then loaded directly into 
two channels of the 50 channel Helicos Genetic Analysis System without addi- 
tional cleaning steps. cDNA sequencing data alignment to the yeast genome 
(October 2003 assembly) was performed using IndexDP”’. 


30. Lipson, D. et al. Quantification of the yeast transcriptome by single-molecule 
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JAK2 phosphorylates histone H3Y41 and excludes 


HP1ca from chromatin 


Mark A. Dawson'”*, Andrew J. Bannister°*, Berthold Géttgens’, Samuel D. Foster’, Till Bartke’, 


Anthony R. Green’** & Tony Kouzarides** 


Activation of Janus kinase 2 (JAK2) by chromosomal transloca- 
tions or point mutations is a frequent event in haematological 
malignancies'*. JAK2 is a non-receptor tyrosine kinase that 
regulates several cellular processes by inducing cytoplasmic 
signalling cascades. Here we show that human JAK2 is present 
in the nucleus of haematopoietic cells and directly phosphorylates 
Tyr 41 (Y41) on histone H3. Heterochromatin protein 1a (HP1a), 
but not HP1f, specifically binds to this region of H3 through its 
chromo-shadow domain. Phosphorylation of H3Y41 by JAK2 
prevents this binding. Inhibition of JAK2 activity in human 
leukaemic cells decreases both the expression of the haematopoie- 
tic oncogene Imo2 and the phosphorylation of H3Y41 at its pro- 
moter, while simultaneously increasing the binding of HP1a at the 
same site. These results identify a previously unrecognized nuclear 
role for JAK2 in the phosphorylation of H3Y41 and reveal a direct 
mechanistic link between two genes, jak2 and Imo2, involved in 
normal haematopoiesis and leukaemia’. 

JAK2 signalling is implicated in various biological processes, 
including cell cycle progression, apoptosis, mitotic recombination, 
genetic instability and alteration of heterochromatin’®*. The most 
common somatic alteration of JAK2 is a gain-of-function mutation 
(JAK2 V617F) associated with human myeloproliferative diseases’. 
The diverse roles of JAK2 in normal and leukaemic haematopoiesis 
are believed to be mediated by cytoplasmic signalling pathways'”. 

We found that JAK2 has a previously unrecognized nuclear pool in 
haematopoietic cells. Figure la and Supplementary Fig. 1 show JAK2 
within the nuclei of three cell lines (HEL, UKE1 and SET2) harbouring 
JAK2 V617F"*. However, JAK2 staining was also nuclear in K562 cells, 
which express wild-type JAK2. Nuclear JAK2 was also observed in 
primary cells, positive for the CD34 stem-cell antigen, obtained from 
a patient with JAK2 V617F-positive post-polycythaemic myelo- 
fibrosis. Transfection of JAK2 into a JAK2-null background, y-2A 
cells'’, independently confirmed the nuclear localization of JAK2 
and validated the specificity of the antibodies used in immunofluor- 
escence (Fig. 1b and Supplementary Fig. 2). Finally, subcellular 
fractionation experiments using HEL cells also demonstrated JAK2 
in the nucleus (Fig. 1c). Taken together, these results demonstrate 
that a significant proportion of JAK2 is present within the nuclei of 
haematopoietic cells, irrespective of JAK2 mutation status. 

To explore the role of JAK2 within the nucleus we investigated the 
possibility that histones could be a substrate. Among all core histones, 
we found that recombinant JAK2 specifically phosphorylated histone 
H3, a reaction inhibited by the JAK2 inhibitor TG101209 (ref. 16) 
(Fig. 2a and Supplementary Fig. 3). H3 contains three highly 
conserved tyrosine residues, one of which, H3Y41, is positioned at 
the amino terminus of the first helix of H3 (the «N1-helix) where 


DNA enters the nucleosome (Fig. 2b). Given its accessible location, we 
reasoned that H3Y41 might be the JAK2 target; we therefore generated 
an antibody against phosphorylated H3Y41 (H3Y41ph), verified its 
specificity (Supplementary Figs 4 and 5) and used it to demonstrate 
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Figure 1 | JAK2 is present in the nucleus of haematopoietic cells. 

a, Confocal immunofluorescent images identify nuclear JAK2 in 
haematopoietic cell lines and primary CD34* peripheral blood stem cells 
(CD34*). CN, copy number; DAPI, 4,6-diamidino-2-phenylindole; 
V617F:WT, the ratio of JAK2 V617F to JAK2 wild-type. Two primary anti- 
JAK2 antibodies were used (detailed in Methods and Supplementary Fig. 2c). 
b, Confocal images of JAK2-null (y2A) cells transfected with JAK2. 

c, Western blotting of cytoplasmic (C) and nuclear (N) extracts 
demonstrates that JAK2 is present in both cellular compartments; however, 
B-tubulin (anti-tubulin) is confined to the cytoplasmic fraction. 
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Figure 2 | JAK2 phosphorylates H3Y41 in vitro and in vivo. a, In vitro kinase 
assay using [y-?P] ATP and recombinant JAK2 (rJAK2). b, H3 (blue) is 
shown in the nucleosome; the inset highlights H3Y41 at the amino terminus 
of the first helix. ¢, In vitro kinase assay followed by western blot analysis 
using the H3Y41ph antibody. d, H3Y41ph was measured in chromatin from 
the indicated cell lines by western blotting. e, After serum starvation, K562 
cells were stimulated with LIF. H3Y41ph and phospho-JAK2 (anti-JAK2ph) 
were measured by western blot analyses of whole cell extracts. f, H3Y41ph 
was determined in chromatin from y2A cells, transfected with either wild- 
type JAK2, JAK2 V617F or empty vector. g, H3Y41ph was measured in 
chromatin from HEL cells grown in the presence of a specific JAK2 inhibitor 
(TG101209) or vehicle control (dimethylsulphoxide; DMSO). 
h, Quantification of western blot in g. Similar results were obtained with a 
second specific JAK2 inhibitor (AT9283; data not shown). 


phosphorylation of H3Y41 by recombinant JAK2 in vitro (Fig. 2c, 
lanes 1-6). H3Y41 phosphorylation was inhibited by TG101209 and 
by mutation of H3Y41 to phenylalanine (Fig. 2c, lanes 7-10). Cellular 
JAK2, immunoprecipitated from HEL cells, also phosphorylated 
H3Y41 (Supplementary Fig. 6). 

To assess the phosphorylation of H3Y41 in vivo, chromatin pre- 
parations from six cell lines were probed with the H3Y41ph antibody. 
H3Y41 phosphorylation was more abundant in cell lines containing 
active JAK2 (SET2, HEL, UKE1 and K562)"*, whereas H3Y41ph was 
significantly decreased in HL60 cells and 2A cells, which lack detect- 
able JAK2 (refs 15, 17) (Fig. 2d). Stimulation of K562 cells with 
leukaemia inhibitory factor (LIF) or platelet-derived growth factor- 
BB (PDGF-BB) resulted in activation of JAK2, as demonstrated by 
JAK2 phosphorylation, and produced a concomitant increase in 
H3Y41ph (Fig. 2e and Supplementary Fig. 7a). Moreover, stimu- 
lation of murine BaF3 cells with interleukin-3 (IL-3), a cytokine that 
signals by means of JAK2 in these cells, also increased H3Y41ph 
(Supplementary Fig. 7b). Taken together, these data demonstrate 
that H3Y41ph is present in vivo and that cytokine signalling regulates 
its levels. 
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The presence of residual H3Y41ph in HL60 and JAK2-null y2A 
cells indicates that JAK2 is not the only kinase responsible for this 
modification. However, transfection of JAK2 into y2A cells demon- 
strated that it is one of the cellular kinases responsible for H3Y41ph in 
vivo (Fig. 2f). To provide further evidence that JAK2 phosphorylates 
H3Y41 in vivo, we used two specific, chemically distinct JAK2 inhi- 
bitors, TG101209 and AT9283 (refs 16, 18). Chromatin prepared 
from HEL cells grown in the presence of either JAK2 inhibitor con- 
tained significantly decreased H3Y41ph compared with non-treated 
cells (Supplementary Fig. 8b). These changes were not a consequence 
of broad effects on cell cycle or apoptosis (Supplementary Fig. 8a). 
Moreover, inhibition of JAK2 produced a rapid and sustained loss of 
H3Y41ph. The decrease in H3Y41ph occurred within 15 min, and by 
1h an 80% decrease was observed (Fig. 2g, h). The rapidity of this 
response, together with the in vitro data, indicates that JAK2 directly 
phosphorylates H3Y41 in vivo. 
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Figure 3 | HP1u. binds the Y41 region of H3 in a phosphorylation-dependent 
manner. a, The soluble fraction from permeabilized nuclei was analysed by 
western blotting for HP1o and HP1f. b, HP1x and HP1f were tested for 
their ability to bind either unmodified H3(31-56) peptides (unmod) or 
identical peptides phosphorylated at H3Y41 (H3Y41ph). Anti-rabbit IgG 
detects H3Y41ph antibody. c, HP1« contains three domains: a chromo 
domain (CD), a hinge (H) and a chromo-shadow domain (CSD). The 
indicated regions were expressed and tested for binding to the unmodified 
region of H3(31-56). GST, glutathione S-transferase. d, CSD binding to 
H3(31—-56) is abrogated by phosphorylation of H3Y41. e, HL60 cells were 
permeabilized and the binding of HP1a to chromatin was challenged by 
competition with the indicated peptides. HP1o localization was then 
revealed by immunofluorescence. f, Quantification of the amount of HP1la 
observed in e. H3(31-56)un, unmodified H3(31—56) peptide; n, number of 
nuclei counted; error bars represent s.d. g, HEL cells were treated with 
DMSO, TG101209 or AT9283. Permeabilized nuclei were prepared and 
challenged with H3K9me3 peptide to disassociate a relatively small 
percentage of HP1a from chromatin. Chromatin and soluble fractions were 
then western blotted for HP1x and H3K9me3. 
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The JAK pathway in Drosophila melanogaster has recently been 
implicated in the alteration of heterochromatin through the disrup- 
tion of HP1 (ref. 13). We therefore investigated whether JAK2 sig- 
nalling in a haematopoietic cell line (HEL) affected the association of 
HPla or HP1B with chromatin. Figure 3a shows that there was a 
significant amount of soluble, non-chromatin-bound HP1« in per- 
meabilized HEL nuclei, whereas HP1 was essentially bound to chro- 
matin (Fig. 3a). This observation raised the possibility that JAK2 
signalling in HEL cells may weaken HP1o binding and/or stabilize 
the binding of HP1f. 

Given that JAK2 directly phosphorylates H3Y41, we considered the 
possibility that H3 may contain an additional binding site for HP 1x or 
HP1B near Y41. Figure 3b and Supplementary Fig. 9a, b show that 
HP 1a bound specifically to an unmodified H3 peptide encompassing 
amino-acid residues 31-56. HP1a binding was markedly decreased 
when the peptide was phosphorylated at Y41. In contrast, HP1B 
bound neither the unmodified nor the modified peptide. The integrity 
of the H3Y41ph peptide was demonstrated by the fact that the 
H3Y4Iph antibody bound only the phosphorylated peptide. 
Binding to the Y41 region of H3 was specifically mediated by the 
chromo-shadow domain (CSD) of HP 1a, whereas its chromo domain 
has been shown to be responsible for its interaction with methylated 
H3K9 (H3K9me) (Fig. 3c)'°’°. Indeed, H3K9me peptides added in 
trans neither stimulate nor inhibit the binding of HP1la to the Y41 
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region of H3 (Supplementary Fig. 9c), but H3Y41 phosphorylation 
inhibited binding of the CSD to H3 (Fig. 3d). Together these data 
demonstrate that the CSD of HP1« binds the Y41 region of H3 and 
that this binding is inhibited by H3Y41 phosphorylation. 

To further characterize HP 1 binding to H3 ina more physiological 
context, we performed immunofluorescence experiments with pep- 
tide competition. Figure 3e, f shows that peptides spanning H3 resi- 
dues 31-56, and peptides containing trimethylated H3K9 (H3K9me3), 
displaced HP 1a from nuclear heterochromatic speckles’’. In contrast, 
a H3(31-56) peptide phosphorylated at Y41 (H3Y41ph) was unable to 
displace HP 1a efficiently from heterochromatin. We next examined 
whether binding of HPla is modulated by JAK2 signalling in vivo. 
Permeabilized nuclei were prepared from HEL cells cultured with or 
without JAK2 inhibitors. Inhibition of JAK2 increased the proportion 
of chromatin-bound HP1« within these nuclei (Fig. 3g; compare lanes 
2 and 3 with lane 1). Whereas the level of H3Y41ph was decreased by 
inhibition of JAK2 (Fig. 2g and Supplementary Fig. 8b), the level of 
H3K9me3 was unaltered, which is consistent with the concept that 
JAK2 signalling reduces HP1o binding by phosphorylating H3Y41. 

To investigate the biological consequences of H3Y41 phosphory- 
lation, we used expression arrays to identify JAK2-regulated genes in 
HEL cells (Fig. 4a and Supplementary Fig. 10a). Of those genes whose 
messenger RNA levels were most decreased by inhibition of JAK2, 
several have previously been identified as transcriptional targets of 


a 40 most downregulated genes b Imo2 
of 18,164 tiled transcripts ie 
i I It [ft 
GeneID log, fold No. of 
change STATS sites 1 12%, 2 3 4 5 
STS-1 1 a ee aa 
ID1 Al 
IGFBPS 2 
FLJ11795 4 
PIM1 1 
HSPAS 3 ChIP analysis over /mo2 
LOC317671 1 c 4 
DARC 1 = 
“ : HP 
TUBAL3 1 g === H3Y41ph 
PLVAP 1 <x mms H3K4me3 
BCL2L1 3 : 
GDF3 1 2 
RABSIL1 3 . 
HMBS 0 2 
SDF2L1 1 2 
SLCO4A1 () S 
NME1 0 3 
KCNH2 0 ne : 
PCOLCE2 1 Amplicons 
HBBP1 il 
ISG20L1 0) 
PSKH2 0 
LOC201164 0 
Sm LMo2 0 c 
NOLA1 al 
GPR56 1 JAK2 JAK2 Increased expression of 
C1ORF33 10} Tie a oncogenes (such as /mo2) 
EGRI 1 H3. gQ @ @ P il 
FLJ43339 3 Y41 ‘egulated pee Dysregulated Qa 
CIORFI86 2 a JAK2 re JAK2 | Increased mitotic 
RRS1 1 wa ——_ wz == SS ———= recombination 
ccpcss 0 2 
RGS19 2 
XTP3TPA 0 e Heterochromatic repression Regulated displacement Sustained displacement Chromosomal dysjunction 
TMC6 1 © Suppression of mitotic of HP1 from chromatin of HP1o. from chromatin and aneuploidy 
SLA2 0 recombination 
KCNN4 1 e Sister chromatid cohesion 
AGTRL1 0 


Figure 4 | JAK2 signalling regulates the expression of the Imo2 oncogene. 
a, Messenger RNA and chromatin (used in b) was isolated from HEL cells 
treated for 4h with TG101209 or DMSO. Messenger RNA from two 
biological replicates was used to generate a gene expression profile. The 40 
most downregulated genes are illustrated. The number of predicted STATS 
DNA-binding sites in each locus is indicated. The yellow wedge highlights 
the /mo2 gene. b, Five regions within the Imo2 locus were investigated 
(amplicons 1-5; see schematic representation of Imo2 locus) by chromatin 
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immunoprecipitation analyses with antibodies against HP1a, H3Y41ph and 
H3K4me3. The data were normalized for H3 occupancy. Error bars 
represent s.d. for each amplicon. ¢, Model depicting the decrease in HP1x 
binding to chromatin after phosphorylation of H3Y41 by JAK2. On the left 
are the known functions of HP14; on the right are the known consequences 
of dysregulated JAK2 seen as a feature in JAK2-mediated haematological 
malignancies. 
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the canonical JAK2-STAT5 pathway (reviewed in ref. 21). However, 
this approach also identified JAK2-regulated genes, including Imo2, 
that lacked a predicted STAT5-binding site as defined previously”. 
Imo2 is essential for normal haematopoietic development, has been 
implicated in leukaemogenesis”’ and was in the top 0.5% of genes 
downregulated by inhibition of JAK2. The link between Imo2 
expression and JAK2 inhibition has been noted previously”; this 
was further confirmed by quantitative polymerase chain reaction 
with reverse transcription (RT-PCR) and with a second JAK2 inhib- 
itor, AT9283 (Supplementary Fig. 10b, c). Chromatin immunopre- 
cipitation was then employed to investigate chromatin structure at 
Imo2 after inhibition of JAK2. Downregulation of Imo2 expression 
(corroborated by decreased levels of H3K4me3) was accompanied by 
decreased levels of H3Y41ph together with a reciprocal increase in the 
binding of HP1a, but not HP1f, at sites surrounding the /mo2 tran- 
scriptional start site (Fig. 4b, Supplementary Fig. 11 and data not 
shown). The promoter of B2M, a housekeeping gene encoding 
B2-microglobulin, and two sites upstream of the Imo2 promoter 
showed no changes in H3K4me3, H3Y41ph or HP lo (Fig. 4b and 
Supplementary Fig. 12). Collectively, these results demonstrate that 
JAK2 signalling results in H3Y41 phosphorylation and the exclusion 
of HP1a from the /mo2 promoter. 

The data presented here demonstrate a novel nuclear function for 
JAK2 that is distinct from its established role as an initiator of cyto- 
plasmic signalling cascades. In the nucleus, JAK2 mediates the phos- 
phorylation of H3Y41 and excludes HP1a from a new binding site 
surrounding H3Y41. Given that H3Y41 lies within a region known to 
affect nucleosome remodelling”, phosphorylation of H3Y41 may 
regulate chromatin architecture around specific gene promoters. 

The displacement of HP 1a by JAK2 is likely to be tightly regulated 
in normal cells, whereas in malignancies driven by constitutive 
activation of JAK2, unregulated displacement of chromatin-bound 
HPla« may override its potential tumour suppressive functions 
(Fig. 4c). HPla is recognized to reduce mitotic recombination”, 
repress the transcription of heterochromatic genes” and preserve 
centromeric architecture, leading to the faithful segregation of sister 
chromatids’’. Indeed, the phenotypic consequences of constitutive 
JAK2 activation in haematological malignancies (increased gene 
expression, mitotic recombination and genetic instability)'*'? are 
consistent with the reversal of these HP1« functions. This suggestion 
is further supported by the fact that enforced overexpression of HP1 
ameliorates the leukaemic phenotype of overactive JAK signalling in 
D. melanogaster’?. 


METHODS SUMMARY 


Cell culture and isolation of peripheral blood stem cells were performed with 
standard methodology’*. Immunofluorescence images were captured with an 
Olympus Fluoview FV1000 microscope, and cells were prepared and stained 
as described previously'*’’. Cell fractionation, immunoprecipitation, western 
blotting and kinase assays were performed with standard methodology”. 
Peptides (Supplementary Table 1) were synthesized by Almac Sciences and used 
for binding/competition assays as described previously”. 


Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 
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METHODS 

Cell culture and transfection. HEL, SET2, UKE-1, HL60, K562 and BaF3 were 
grown in RPMI 1640 medium (Sigma-Aldrich), and y2A cells were grown in 
DMEM medium (Sigma-Aldrich). All growth media were supplemented with 
10% fetal calf serum and 1% penicillin/streptomycin. BaF3 cells were grown in 
the presence of 10ngml' IL-3. Cells were incubated at 37°C and 5% CO. 
Transient transfection of y2A cells was performed with FuGENE (Roche 
Applied Science) in accordance with the manufacturer’s instructions. 
Cytokine stimulation was performed in K562 and BaF3 cells after 72 h of serum 
starvation. LIF (1,000IU ml |), PDGF-BB (10 ng ml‘) and IL-3 (10 ng ml ') 
were used individually to stimulate the cells for up to 90 min. Mouse embryonic 
fibroblasts with the functional status of H2AX WT (H2AX*!*) and H2AX-null 
(H2AX '~) were provided by K. Miller and S. P. Jackson. 

Isolation of peripheral blood stem cells. Mononuclear cells from peripheral 
blood were separated over a Ficoll density gradient. CD34* cells were then 
purified by a double-positive magnetic cell sorting system (AutoMACS; 
Miltenyi Biotec), in accordance with the manufacturer’s instructions. 
Immunofluorescence microscopy. HEL, SET2, UKE-1, HL60, K562 and 
CD34* cells were washed once in 1 X PBS before cytocentrifugation onto poly- 
lysine-coated microscope slides. y2A cells were grown on coverslips before wash- 
ing in 1 X PBS. Cells were fixed for 30 min in methanol at —20 °C. After stepwise 
incubation with a primary antibody, and then a secondary fluorescent antibody, 
cells were stained with Hoechst 33258 (Sigma-Aldrich) and mounted with 
Vectashield mounting medium (Vector laboratories). Confocal laser images 
were captured with an Olympus Fluoview FV1000 microscope equipped with 
a 40X oil-immersion lens. Image processing was performed with Photoshop 
(Adobe Systems). 

Cell fractionation, immunoprecipitation and immunoblotting. Cytoplasmic, 
nucleosolic and chromatin fractions were prepared from cells as described previ- 
ously”*. In brief, cells were washed twice in 1 X PBS and once in buffer A (10 mM 
HEPES pH7.9, 1.5mM MgChL, 10mM KCl, 0.5 mM dithiothreitol (DTT) and 
protease inhibitor cocktail). Cells were then pelleted and resuspended in buffer A 
with 0.1% (v/v) Nonidet P40 and incubated on ice for 10 min. The supernatant 
containing the cytoplasmic fraction was collected after centrifugation and the 
pellet was resuspended in an equal volume (relative to the cytoplasmic extract) of 
buffer B (20 mM HEPES pH 7.9, 1.5mM MgCl, 300 mM NaCl, 0.5mM DTT, 
25% (v/v) glycerol, 0.25% Triton X-100, 0.2 mM EDTA and protease inhibitor 
cocktail). After centrifugation, the supernatant contains the nucleosolic fraction 
and the insoluble pellet is composed primarily of chromatin and associated 
proteins. Equal volumes of cytoplasmic and nucleosolic fractions were separated 
by SDS-PAGE, transferred to nitrocellulose and probed with relevant antibodies. 
For immunoprecipitation, cells were lysed in IPH (150 mM NaCl, 50 mM Tris- 
HCl pH 8.0, 5mM EDTA, 0.5% (v/v) Nonidet P40) on ice for 15 min and the 
supernatant was collected after centrifugation and used for immunoprecipitation. 
Sodium orthovanadate (1 mM) was added to all solutions when performing 
assays relating to the study of tyrosine phosphorylation. Extracted proteins were 
mixed with 2 X Laemmli sample buffer, separated by SDS-PAGE, transferred to 
nitrocellulose or poly(vinylidene difluoride) (PVDF) membranes (Millipore) and 
stained with Ponceau S to ensure equal transfer. Membranes were then sequen- 
tially incubated with primary antibodies and secondary antibodies conjugated 
with horseradish peroxidase. Membranes were then incubated for enhanced 
chemiluminescence (ECL@; GE Healthcare) and proteins were detected by expo- 
sure to X-ray film. Dot-blot assays were performed by spotting synthetic peptide 
onto pre-wetted PVDF membrane. The membrane was then sequentially probed 
with primary and secondary antibodies as above. If appropriate, the primary 
antibody incubation was performed in the presence of competitor peptides 
(1.0 ug ml) as indicated in the relevant figure panel. 

Kinase assays. The JAK2 Enzymatic Assay Kit, HTScan (Cell Signaling 
Technology) containing active JAK2 as a glutathione S-transferase (GST) fusion 
protein, recombinant AKT1 (Cell Signaling Technology) and human JAK2 
immunoprecipitated (see antibodies) from HEL cells were used in kinase assays 
in vitro employing the same reaction conditions. In brief, assays were performed 
in 50 ul of kinase buffer (60 mM HEPES pH7.9, 5mM MgCl, 5mM MnCh, 
3uM Na3VO4, 1.25mM DTT, 20uM ATP). [y-°P]ATP (370kBq; 
222 TBq mmol _?; Perkin Elmer) was added to the buffer in the radiolabelled 
kinase assays. Calf thymus histones (2 1g) (core histone, an equimolar mixture of 
H3, H2A, H2B and H4, 10223565001, Roche; and purified H3 from calf thymus, 
11034758001; Roche) or recombinant histones were used as substrates. 
Recombinant histones were expressed in, and purified from, bacteria as 
described previously”. 

Site-directed mutagenesis. Mutagenesis to introduce the H3Y41F mutant into 
human histone H3 was performed with the Quickchange Site-directed 
Mutagenesis kit (Stratagene) in accordance with the manufacturer’s instructions. 
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JAK2 inhibitors. AT9283 was provided by J. Lyons and M. Squires. TG101209 
(TargeGen Inc.) is commercially available and was used at 10 nM in the in vitro 
kinase assays and at 3 tM in vivo. AT9283 was used at 300 nM in vivo. 
Antibodies. The principal H3Y41ph antibody used in the manuscript was anti- 
H3Y41ph (ab26310; Abcam), and it was used for western blotting at 1:1,000 
dilution; two further H3Y41ph antibodies were raised in rabbits against a 
Y41ph peptide spanning residues H3(37-48) by using the Eurogentec 28-day 
commercial protocol. The following antibodies were also used at the stated 
dilutions: anti-JAK2 antibodies (D2E12 no. 3230; Cell Signaling Technology), 
(IMG-3007; Imgenex) western blot 1:1,000; immunofluorescence 1:100, 
immunoprecipitation 1:250; (AB3804; Millipore) western blot 1:1,000 and 
phospho-JAK2 (ab32101) western blot 1:500; anti-B-tubulin (T5201; Sigma- 
Aldrich) western blot 1:750; anti-phosphotyrosine (4G10; Millipore) western 
blot 1:1,000; anti-H3 (ab1791; Abcam) western blot 1:10,000; anti-GAPDH 
(ab9483; Abcam) western blot 1:5,000; anti- H2Ax (ab1175; Abcam) western blot 
1:5,000; anti-Flag (Sigma-Aldrich) immunoprecipitation 1:250; anti-HP1o (no. 
2616; Cell Signaling Technology) western blot 1:1,000, immunofluorescence 
1:400; anti-HP1o (clone15.19s2; no. 05-689; Millipore); anti-HP1f (no. 2613; 
Cell Signaling Technology) western blot 1:1,000; Texas-red-conjugated IgG 
(Invitrogen) immunofluorescence 1:250; and Alex Fluor-488-conjugated IgG 
(Invitrogen) immunofluorescence 1:250. 

Recombinant protein production. Recombinant proteins were expressed in and 
purified from Escherichia coli as described*®. Mouse full-length HP1 isoforms 
and the chromo domain (residues 5-80), hinge (residues 61-121) and chromo- 
shadow domain (residues 110-188) of HP1« were cloned into pGex vector and 
expressed as a GST fusion protein. 

Pulldown assays. GST fusion proteins and biotin-conjugated peptides were 
incubated for 1h with glutathione-agarose beads or streptavidin-Sepharose 
beads, respectively, in binding buffer (150 mM NaCl, 50 mM Tris-HCl pH 8.0, 
5 mM EDTA, 0.25% (v/v) Nonidet P40) at room temperature (between 20 and 
25°C). After being washed three times in binding buffer, the beads were incu- 
bated with their potential binding proteins for 1h at room temperature. The 
beads were then washed four times with binding buffer, after which bound 
protein was eluted with hot 2 X Laemmli sample buffer. 

Preparation of nuclei for assessing soluble and chromatin-bound HP1a/B. 
HEL nuclei were purified*' and permeabilized” as described, except that 300 mM 
NaCl and 0.25% (v/v) Triton X-100 were used. Nuclei were pelleted and the 
chromatin fraction was separated from the soluble nuclear fraction as described 
previously’. The chromatin and soluble nuclear fractions were then analysed by 
western blotting for HP1« and HP1B. When comparing the localization of HPla 
in the presence or absence of JAK2 inhibitors, a batch of cells was split into three 
equal amounts and incubated with vehicle alone (DMSO), TG101209 (3 1M) or 
AT9283 (300nM) for 4h before the isolation of nuclei. Permeabilized nuclei 
were diluted into PBS and incubated for 2 h with 0.75 ng ml” ' H3K9me3 peptide 
on ice. They were then processed as above. 

Peptide competition and immunofluorescence of mammalian cells. HL60 
cells were cytospun onto polylysine-coated slides and fixed for 2 min in ice-cold 
methanol (containing 10 mg ml peptide, where used). They were then blocked 
for 15 min in 3% bovine serum albumin, 0.6% (v/v) Triton X-100 in PBS (con- 
taining 10 1g ml ' peptide, where used). We performed staining with the anti- 
HP lo antibody. Antibody incubations contained 10ugml~' peptide (where 
used). The displacement of HP 1a from the nucleus was quantified with Image 
J software (National Institutes of Health). 

Gene expression and computational analysis. HEL cells were treated for 4 h with 
either TG101209 JAK2 inhibitor or DMSO (vehicle) alone. From these cells, 
mRNA was prepared. Gene expression changes (log, fold with and without inhibi- 
tor) of duplicate expression profiling samples were calculated with Bioconductor 
(http://www.bioconductor.org/). Using Illumina Gene IDs, we obtained Ensembl 
gene coordinates for human genome build NCBI 36.1, using Biomart (http:// 
www.biomart.org/). To map conserved STAT5-binding sites in non-coding 
sequences, we generated a genome-wide data set of STAT5 motifs 
(TTCYNRGAA) conserved in human/mouse whole-genome alignments obtained 
from the UCSC genome browser (http://genome.ucsc.edu/) using the TFBSCluster 
program (http://hscl.cimr.cam.ac.uk/TFBScluster_genome_portal.html) with the 
non-exact search parameters (for example, the ambiguous letters YNR may differ 
between the human and mouse sequences). Finally, the number of conserved 
STATS sites in each gene locus (ranging from 50 kilobases 5’ of the first exon to 
50 kilobases 3’ of the last exon) was calculated with an in-house PERL script. 
Chromatin immunoprecipitation assay and real-time PCR analysis. HEL cells 
were treated for 4h with either TG101209 JAK2 inhibitor or DMSO (vehicle) 
alone. From these cells, chromatin was prepared and chromatin immunopreci- 
pitation was performed as described previously*’, with the following important 
exceptions. Cells were crosslinked with 1% (v/v) formaldehyde for 15 min at 
room temperature; crosslinking was stopped by the addition of 0.125 M glycine. 
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The time and percentage of formaldehyde crosslinking is crucial to ensure 
optimal recognition of the H3Y41ph and HP 1 epitopes. Cells were then lysed 
in 1% (w/v) SDS, 10 mM EDTA, 50 mM Tris-HCl pH 8.0, 1 mM sodium ortho- 
vanadate and protease inhibitors. Cells were sonicated in a Bioruptor 
(Diagenode) to achieve a mean DNA fragment size of 500 base pairs. An equal 
volume of Protein-A and Protein-G agarose beads, pre-absorbed with sonicated 
salmon-sperm DNA and BSA, were used to preclear the chromatin for 2 h before 
immunoprecipitation. Immunoprecipitation was performed for a minimum of 
12 hat 4 °C in modified RIPA buffer (1% (v/v) Triton X-100, 0.1% (w/v) sodium 
deoxycholate, 0.1% (w/v) SDS, 90mM NaCl, 10mM Tris-HCl pH 8.0, 1 mM 
sodium orthovanadate, EDTA-free protease inhibitors). An equal volume of 
Protein-A and Protein-G agarose beads, pre-absorbed with sonicated salmon- 
sperm DNA and BSA, were used to bind the antibody and associated chromatin. 
The beads were washed before elution of the antibody-bound chromatin. 
Reverse crosslinking of DNA was followed by DNA purification with the 
QIAquick PCR purification kit (Qiagen). Immunoprecipitated DNA was ana- 
lysed on an ABI 7300 real-time PCR machine, with power SYBRgreen PCR 
mastermix in accordance with the manufacturer’s instructions. The following 
primer pairs were used in the chromatin immunoprecipitation analysis: Lmo2 
amplicon 1, 5’-CAGGCTTCTCCCGTGTAACTG-3’ (forward) and 5'-AGGAC 
CTCACACGTTGAAGACA-3’ (reverse); Lmo2 amplicon 2, 5'-AGGGAAGTAT 
GACACAATCGAACA-3’ (forward) and 5’-TGGCAGAGCCCGTATGCTA-3’' 
(reverse); Lmo2 amplicon 3, 5'-CCAGACAAACTCAAATAACGTACACA-3' 
(forward) and 5'‘-AGTGGGTACCATTGTCCCTGTT-3’ (reverse); Lmo2 ampli- 
con 4, 5’-CCTACTCAGAATGTGGAGACTTGTG-3' (forward) and 5’-TGGCC 
TCTGGGAATTGGA-3’ (reverse); Lmo2 amplicon 5, 5’-GGACTTCGCTCT 
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TCCATCCA-3’ (forward) and 5'-GGCATCGGTGTCAGACCAA-3’ (reverse); 
B2-microglobulin, 5’-TGGGCACGCGTTTAATATAAGTG-3’ (forward) and 
5'-GCCCGAATGCTGTCAGCTT-3’ (reverse). 

Messenger RNA was prepared from cell extracts with the Qiagen RNeasy kit in 
accordance with the manufacturer’s instructions. Complementary DNA was 
then prepared with Superscript III Reverse Transcriptase (Invitrogen) and ana- 
lysed on an ABI 7300 real-time PCR machine, using power SYBRgreen PCR 
mastermix in accordance with the manufacturer’s instructions. The following 
primer pairs were then used in the cDNA analysis: Lmo2, 5'-CGGCGC 
CTCTACTACAAACT-3’ (forward) and 5'-GAATCCGCTTGTCACAGGAT-3’ 
(reverse); B3-microglobulin, 5’-TGACTTTGTCACAGCCCAAG-3’ (forward) 
and 5’-AGCAAGCAAGCAGAATTTGG-3’ (reverse). 
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Structural insights into mechanisms of the small RNA 


methyltransferase HEN1 


Ying Huang’, Lijuan Ji?, Qichen Huang’, Dmitry G. Vassylyev’, Xuemei Chen* & Jin-Biao Ma’? 


RNA silencing is a conserved regulatory mechanism in fungi, plants 
and animals that regulates gene expression and defence against 
viruses and transgenes’. Small silencing RNAs of ~20-30 nucleo- 
tides and their associated effector proteins, the Argonaute family 
proteins, are the central components in RNA silencing’. A subset of 
small RNAs, such as microRNAs and small interfering RNAs 
(siRNAs) in plants, Piwi-interacting RNAs in animals and siRNAs 
in Drosophila, requires an additional crucial step for their matura- 
tion; that is, 2’-O-methylation on the 3’ terminal nucleotide**. A 
conserved S-adenosyl]-L-methionine-dependent RNA methyltrans- 
ferase, HUA ENHANCER 1 (HEN1), and its homologues are 
responsible for this specific modification**”*. Here we report the 
3.1A crystal structure of full-length HEN1 from Arabidopsis in 
complex with a 22-nucleotide small RNA duplex and cofactor 
product S-adenosyl-t-homocysteine. Highly cooperative recog- 
nition of the small RNA substrate by multiple RNA binding 
domains and the methyltransferase domain in HEN1 measures 
the length of the RNA duplex and determines the substrate specifi- 
city. Metal ion coordination by both 2’ and 3’ hydroxyls on the 
3'-terminal nucleotide and four invariant residues in the active site 
of the methyltransferase domain suggests a novel Mg”*-dependent 
2’-O-methylation mechanism. 

HEN| was first identified in a genetic screen as a floral pattering 
gene and later found to be essential for Arabidopsis microRNA 
(miRNA) accumulation in vivo". Subsequently, HEN1 was demon- 
strated to be a methyltransferase for miRNAs and all types of siRNAs 
in plants*''. The 2’-O-methylation protects miRNAs and siRNAs 
from 3’-end uridylation and 3’-to-5' exonuclease-mediated degra- 
dation in Arabidopsis'*'*. The plant HEN1 and its animal homolo- 
gues share a highly conserved methyltransferase (MTase) domain™* 
(Fig. le) that is not closely related to any known RNA 2'-O-MTases 
according to a phylogenetic analysis'*. Two putative RNA binding 
modules, a double-stranded RNA binding domain (dsRBD) anda La 
motif have been identified in the amino-terminal region of HEN1 
(ref. 15). To understand the specific recognition of small RNA sub- 
strates and the molecular mechanism of the 3’-end 2'-OH-specific 
methylation by HEN] and its homologues, we determined the crystal 
structure of full-length Arabidopsis HEN1 in complex with a small 
RNA duplex in the presence of the cofactor product adenosyl-L- 
homocysteine (AdoHcy). 

The recombinant full-length Arabidopsis HEN1 (residue 1-942) 
was co-crystallized with AdoHcy and a 22-nucleotide small RNA 
duplex containing a fully complementary 20-nucleotide segment 
(Fig. 1f) derived from a natural substrate of HEN1, miR173/ 
miR173* (refs 3, 11) (Supplementary Fig. 2c). The crystal structure 
was determined at 3.1A as described in Methods. The structure 
revealed that Arabidopsis HEN1 binds to the small RNA substrate as 
a monomer (Fig. 1), which is supported by results from gel filtration 


experiments (Supplementary Fig. 2). The small RNA substrate 
exhibits an A-form conformation in the ternary complex structure 
and both duplex termini are specifically recognized by HEN1. The 
HENI protein consists of five structural domains (Fig. le), four of 
which directly interact with the small RNA substrate (Fig. la—c) with 
the exception of the PPlIase-like domain (PLD) which shows a high 
degree of structural similarity to well characterized FK506-binding 
proteins’®. The A-form duplex of the small RNA substrate is bound 
by two double-stranded RNA (dsRNA)-specific binding domains”, 
dsRBD1 and dsRBD2. The [5’-m:3'-u] terminus containing the 
3’-end 2-nucleotide overhang of the strand that is not methylated 
(u strand) (Fig. 1f) is bound by the La-motif-containing domain 
(LCD). Meanwhile, the 3’-end 2-nucleotide overhang of the strand 
that is methylated (m strand) (Fig. 1f) is deeply buried into the active 
site of the MTase domain (Fig. 1c). The interface between HEN1 and 
the small RNA substrate buries a total solvent-accessible surface area of 
~5,000 ? (Fig. 1c), of which dsRBD1, dsRBD2, LCD and the MTase 
domain each contributes 31%, 13%, 17% and 33%, respectively. 

Structure-based sequence alignment and structural superimposi- 
tion revealed that both dsRBDs contain distinct long insertions in the 
loop between B1 and B2 (Supplementary Fig. 4). The insertion in 
dsRBD1 is well defined (Supplementary Fig. 4b), in which a conserved 
hydrophobic patch stacks over the carboxy-terminal B-strand in the 
MTase domain (Supplementary Fig. 4c). The insertion in dsRBD2 is 
longer but less conserved than that in dsRBD1 in most plant HEN1 
proteins (Supplementary Fig. 4a) and is completely disordered in the 
current structure (Supplementary Fig. 4b). Three conserved RNA 
binding motifs in canonical dsRBDs”’ can be identified in dsRBD1 
(Fig. 2a), whereas only two RNA binding motifs are identified in 
dsRBD2 owing to the disordered loop between B1 and {2 (Fig. 2b). 
As revealed by buried surface analysis, the interaction of dsRBD1 with 
the RNA substrate is more extensive than that of dsRBD2. Compared 
to dsRBD1, dsRBD2 shifts by approximately 3 A away from the RNA 
duplex, which may favour binding small RNA duplexes with bulges 
that are common among miRNAs. The binding of the RNA duplex by 
dsRBD1 has a key role in substrate recognition, as deletion of dsRBD1 
markedly reduced the substrate binding by HEN1, as determined by a 
cross-linking binding assay (Supplementary Fig. 5a), and its activity, 
as revealed by a small RNA methyltransferase assay (Supplementary 
Fig. 5b). 

As predicted by bioinformatics analysis'’, the N-terminal half of 
the LCD contains a La motif fold (Supplementary Fig. 6) that has 
been shown to specifically bind RNA 3’ ends through synergistic 
cooperation with an RNA recognition motif in the La protein’. 
However, recognition of the [5’-m:3’-u] duplex terminus of the 
small RNA substrates by the La motif and the C-terminal portion 
of LCD in HENI (Fig. 1f) is different from that observed in the 
human La protein”. The 3’-terminal nucleotide binding pocket in 
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Figure 1| Structures of HEN1 in complex with a small RNA duplex and 
AdoHcy. a, Ribbon diagram of the complex. dsRBD, violet; La motif, 
chocolate; LCD, wheat; dsRDB2, cyan; PLD, purple; MTase, green; linkers 
including L1, L2 and L4, grey; RNA strand to be methylated (m strand), red; 
RNA strand not to be methylated (u strand), blue; AdoHcy, yellow; Mg” a 
brown. b, Ribbon diagram of the complex rotated by 90° about the 
horizontal axis relative to a. ¢, d, Surface and surface charge views of HEN1 


the human La N-terminal domain (NTD)” is occupied by two con- 
served residues—H120 and P121—within a HEN 1-specific insertion 
(Supplementary Fig. 6c). The 2-nucleotide 3’ overhang of the 
u strand is looped out from the RNA duplex towards the first «-helix 
of the La motif and the phosphate of the overhang is bound by Y109 
from the first «-helix of the La motif (Fig. 2c). Mutation of Y109 to 
alanine has no detectable effect on substrate binding or HEN] activity 
(Supplementary Fig. 5a, b), indicating that this interaction is not 
essential for the interaction with small RNA substrates. This result 
is also consistent with a previous study showing that mutation of the 
3'-end 2-nucleotide overhang of the u strand to either a 1-nucleotide 
or 3-nucleotide overhang has no effect on HEN1 activity on the m 
strand". 

Furthermore, W333, a conserved residue within a loop from the 
C-terminal portion of the LCD, stacks over the base of the 5’-terminal 
nucleotide Gl, (Fig. 2c), and the side chain of W333 occupies the 
same position as the base of the antepenultimate nucleotide in the 
structure of the La NTD-RNA complex” (Supplementary Fig. 6c). 
Therefore, W333 exactly stacks on H120 and P121 in the La motif of 
the LCD, which may stabilize the stacking interaction between W333 
and the 5’-terminal nucleotide. This end-capping interaction has an 
essential role in the recognition of small RNA substrates, because the 
W333A mutant loses both RNA binding ability and small RNA 
methyltransferase activity (Supplementary Fig. 5a, b). A similar inter- 
action between small RNA duplex and tryptophan residues has been 
observed in structures of the viral RNA silencing suppressor p19-small 
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in the complex in the same orientation as a. e, Schematic representation of 
the domains in HEN1 with the same colour codes as in a. The disordered L3 
is represented by the dashed line. f, Sequences of the small RNA duplex used 
in the co-crystallization. The m strand and u strand are coloured red and 
blue, respectively, and the two termini, [5'-m:3’-u] and [5’-u:3’-m], are 
indicated. 


RNA complexes’!”*. In fact, p19 interferes with small RNA 3’-end 
methylation by HEN1 (ref. 23). In addition, recognition of the 
[5'-m:3'-u] terminus by the LCD is also strengthened by a group of 
positively charged residues that project side chains into the major 
groove of the duplex terminus (Fig. 2c). 

The MTase domain of HEN] adopts a core «/ Rossmann struc- 
ture, in which the cofactor product AdoHcy is bound as in classical 
S-adenosyl-1-methionine (AdoMet)-dependent MTases™ (Fig. 3b). 
The ribose ring of AdoHcy directly stacks over the 5'-terminal nuc- 
leotide U1, and the 5’ phosphate of the u strand is hydrogen bonded 
to the side chain of S747 (Fig. 3b). Three conserved positively charged 
residues—K749, R753 and K756—interact with the major groove of 
the [5'-u:3’-m] terminus (Fig. 3b), enhancing the 5’ phosphate inter- 
action. These three positively charged residues and S747 are only 
conserved in plant HEN] proteins (Supplementary Fig. 7), indicating 
that recognition of the 5’-phosphate by the MTase domain is not 
applicable to animal HEN! homologues. The backbone phosphate 
connecting the 2-nucleotide 3’ overhang and the RNA duplex seg- 
ment is anchored by a loop formed by six residues (F692 to F697) 
within motif X of the MTase domain (Fig. 3a). In particular, two 
non-bridged phosphodiester oxygens form hydrogen bonds with 
main-chain amines of F692 and L697, respectively (Fig. 3a). The loop 
structure is further stabilized by hydrophobic stacking interactions 
between the side chains of F693 and L697 and by the hydrogen bond 
formed by the side-chain amide of the invariant residue Q700 and the 
carboxyl oxygen between tandem prolines P695 and P696 in the loop 
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Figure 2 | Small RNA substrate recognition by dsRBDs and LCD. a, The 
duplex region of the small RNA substrate is bound by three RNA binding 
motifs in dsRBD1. b, The duplex region of the small RNA substrate is bound 
by two RNA binding motifs (RBM1 and RBM3) in dsRBD2. c, The LCD 


(Fig. 3a). Most residues in this loop are invariant in the MTase 
domains of both plant HEN1 and animal homologues (Supplemen- 
tary Fig. 7), indicating that this specific interaction by the conserved 
loop in motif X may also be applicable to animal HEN1 homologues. 

The penultimate nucleotide A21,, of the 2-nucleotide 3’ overhang is 
flipped out from the duplex and the base of A21,, is stacked on the side 
chains of the conserved residues R856 and L835 (Fig. 3a). The 3’-end 
nucleotide G22,, is flipped back and its base is stacked over the ter- 
minal base pair of the duplex (Fig. 3b, c). There are no intermolecular 
hydrogen bonds between two bases of the 2-nucleotide 3’ overhang 
and the MTase domain, which is consistent with the non-sequence- 
specific methyltransferase activity of HEN1. The backbone phosphate 
of the 2-nucleotide 3’ overhang is secured by two invariant, positively 
charged residues R701 and R856 (Fig. 3). Mutation of either R701 or 
R856 to alanine attenuates the methyltransferase activity of HEN1 
(Supplementary Fig. 5b, c), indicating that these two residues are 
important for the efficiency of HEN] activity but are not essential. 

The ribose ring of G22,, is located in the centre of the active site of 
the MTase domain, where both the 2’ and 3’ hydroxyls of G22, and 
the side chains of four invariant residues (E796, E799, H800 and 
H860) are coordinated to a metal ion, Mg”* (Fig. 3c, d and Sup- 
plementary Fig. 8). The highly organized Mg” -mediated coordina- 
tion precisely presents the 2’ hydroxyl of the 3'-terminal nucleotide 
towards the Sd atom of AdoHcy (Fig. 3d), indicating that the 2’-O- 
methylation by HEN1 may be Mg’*-dependent (Supplementary 
Fig. 9). Treatment with increasing concentrations of EDTA that 
chelates Mg** in the reaction eventually eliminates HEN1 activity 
(Supplementary Fig. 5d, f), suggesting that HEN] is indeed a Mg’*- 
dependent small RNA methyltransferase. Mutation of any one or two 
coordinated residues to alanines completely abolished HEN] activity 
(Supplementary Fig. 5b, c). 

Previous biochemical studies*'' defined the features of a small 
RNA substrate that are strictly required for HEN] activity: a length 
of 19-25 nucleotides, a duplex with 2-nucleotide 3’ overhang, and 
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binds to the [5’-m:3’-u] terminus of the small RNA substrate. The 
2-nucleotide 3’ overhang of the u strand is recognized by the La motif. The 
base of the 5’-terminal nucleotide G1,,, is end-capped by W333 in the 
C-terminal LCD. 


free 2’ and 3’ hydroxyls on the 3’ terminal nucleotide. The structure 
of the HEN1-small RNA complex revealed that multiple domains 
in HEN] cooperate to bind small RNA substrates, which precisely 
illuminates the RNA substrate specificity of HEN1 (Fig. 4). The 
RNA substrate may be initially targeted by the N-terminal domain 
dsRBD1 in HEN1, which allows HEN] to only act on double-stranded 
RNAs’. The recognition of an RNA duplex by a classical dsRBD spans 
about 16 bp”. Thus, the small RNA duplexes produced in plant, 
approximately 21—24 nucleotides long, are well targeted in the initial 
recognition. The end-capping interaction by LCD is synergized by 
dsRBD2, which, together with dsRBD1, forms a strong grip on the 
duplex region of the small RNA substrate, and these interactions help 
position the other duplex terminus towards the MTase domain. The 
recognition of the 2-nucleotide 3’ overhang by the MTase domain and 
the coordination of both the 2’ and 3’ hydroxyls of the 3’-terminal 
nucleotide to Mg’* restrict the MTase domain within a limited range 
where it can efficiently methylate the 2'-hydroxyl on the 3’-end nuc- 
leotide. Overall, the preferred length of the small RNA substrates 
recognized by HEN] is determined by the distance between the active 
site of the MTase domain and the 5'-end-capping site in the LCD 
(Fig. 4). 

The mode by which HEN1 measures the length of the small RNA 
substrate is similar to that of the RNase Dicer, a molecular ruler 
cleaving the dsRNA substrate at a specified distance from the duplex 
terminus recognized by the PAZ domain’, although the 3’-end 
recognition by HEN1 is different compared with that by the PAZ 
domain*””*. Animal HEN1 homologues only act on single-stranded 
small RNAs***, and their small RNA methyltransferase activities are 
stimulated through interaction with Argonaute proteins‘ (Y. Kirino, 
personal communication). Thus, it is possible that animal HEN1 
homologues adopt an alternative mode to recognize small RNA sub- 
strates (Supplementary Fig. 10), but the mechanism of the Mg”*- 
dependent 2'-O-methylation by the MTase domain is expected to be 
conserved. 
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Figure 3 | Small RNA substrate recognition by the MTase domain. a, The 
phosphate connecting the 2-nucleotide 3'-overhang of the m strand with the 
duplex region is specifically recognized by a conserved loop (F692-L697). 
The penultimate nucleotide A21,, is flipped out and its base is stacked on the 
side chains of L835 and R856. The phosphate of the 2-nucleotide overhang is 
hydrogen bonded by R701 and R856. b, The base of the 3’-terminal 


Figure 4 | Proposed model for the specific recognition of small RNA 
substrates by HEN] and the Mg**-dependent 2'-O- 

methyltransferase mechanism. A small RNA substrate is targeted by 
multiple RNA binding domains in HEN1. The duplex region is gripped by 
dsRBD1 and dsRBD2, and one terminus is projected towards the MTase 
domain that is located within a range of 18-22 bp from another terminus 
end-capped by a tryptophan residue in LCD. Consequently, the MTase 
domain preferably recognizes the 2-nucleotide 3’ overhang on the small 
RNA substrate of 20-24 nucleotides in length and methylates the 2'- 
hydroxyl of the 3'-terminal nucleotide in a Mg”* -dependent manner. 
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nucleotide of the m strand G22,, is stacked on the terminal base pair formed 
by A20,, and U1, and the 5’-phosphate of the u strand is recognized by S747. 
c, Both 2’ and 3’ hydroxyls of the 3'-terminal nucleotide G22, are 
coordinated to Mg” me along with four invariant residues, E796, E799, H800 
and H860. d, A stereo view of the Mg?* coordination covered with F, — F. 
electron density omit map contoured at 3.00. 


METHODS SUMMARY 


The cDNA of the full-length Arabidopsis HEN] was cloned into the vector pET28 
to result in an N-terminal 6X His tag and expressed in Escherichia coli BL21- 
Gold(DE3). The protein was purified by affinity, ion exchange and gel filtration 
chromatography and concentrated. HEN1 mutants were obtained with the 
QuickChange site-directed mutagenesis kit (Stratagene) or a PCR-based method, 
and verified by sequencing. RNA oligonucleotides used in the crystallization and 
assays were ordered from Dharmacon or Integrated DNA Technologies and 
purified by PAGE or HPLC. Small RNA duplexes were annealed before use. 
Crystals of HEN1 in complex with the small RNA duplex and AdoHyc were 
obtained by vapour diffusion with the reservoir solution of 15% PEG3350, 
0.2M sodium chloride, 0.01 M sodium bromide and 0.1 M phosphate-citrate, 
pH 4.8. The 3.1 A native data and the 3.4 A MAD data were collected at beamlines 
19BM and 23ID of Argonne National Laboratory, respectively (Supplementary 
Table 1). The final model was refined on 3.1 A native data to Rfree 28.8% and Reactor 
26.0% with good stereochemistry. Figures were prepared with Pymol (http:// 
www.pymolLorg). 

The in vitro small RNA methyltransferase assay was performed as previously 
described” with minor modifications. Briefly, 100 1] methyltransferase reactions 
were set up for annealed small RNA substrates and HEN1 mutants and monitored 
by incorporation of the ['*C]methyl group. To assay the Mg’*-dependent 
methyltransferase activity, Mg~’ was omitted in all annealing buffer and reactions 
except as indicated. Different amounts of Mg~* and EDTA were added into 
individual reactions containing the HEN1 protein and were incubated at room 
temperature for 15 min before adding small RNA substrates and ['*C]-labelled 
AdoMet. The in vitro RNA-protein crosslinking assay was carried out using iodo- 
uridine-labelled small RNA substrates as described”®. 
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Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 
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METHODS 

Protein expression and purification. DNA fragments corresponding to full- 
length HEN 1 were amplified from the cDNA and inserted into the pET28a vector 
(Novagen) under Ncol and Xhol sites to result in an N-terminal His tag 
(MGHHHHHH). A double point mutation L604P/K640R was introduced into 
HEN1 during PCR amplification and L604P was reversed by site-directed 
mutagenesis to result in the single mutant K640R. Because no differences in 
methyltransferase activity were observed among the double mutant L604P/ 
K640R, the single mutant K640R and wild-type HEN1 (data not shown), if 
not specifically indicated, the double mutant L604P/K640R and the single 
mutant K640R were treated as wild-type HEN1 for protein purification and 
crystallization in this study. Mutants W333A, E799A/H800A, H860A, R701A 
and R856A were generated by a PCR-based overlap extension method. E796A, 
H860Q, Y109A, H800Q as well as the correction of L604P were generated using 
the QuikChange Lightning site-directed mutagenesis kit (Stratagene). The 
N-terminal deletion mutant AN89 (90-942) was generated by PCR amplifica- 
tion and subcloning. Primers for cloning and mutagenesis are listed in 
Supplementary Table 2. The presence of the mutations was confirmed by 
sequencing. The recombinant HEN] proteins were expressed in E. coli BL21- 
Gold(DE3) (Stratagene). After induction with 0.2mM IPTG, the cells were 
allowed to grow at 17 °C for 20 h. Collected cells were lysed by a C-3 cell disruptor 
(Avestin) at 4°C. Proteins were purified by affinity His-Trap column, ion- 
exchange Q column, heparin column and size-exclusive column Superdex 200 
(GE Healthcare). Further chromatography on a Mono-Q column (GE 
Healthcare) was required to obtain high-quality proteins for crystallization. 
Purified proteins were concentrated to 15mgml_' in a buffer containing 
10mM HEPES (pH 7.5), 50 mM KCl and 2 mM dithiothreitol (DTT), and flash 
frozen in liquid nitrogen before storing at — 80 °C. SeMet-labelled proteins were 
produced by inhibiting endogenous methionine biosynthesis*' in M9 minimial 
media supplemented with specific amino acids as well as SeMet, and purified as 
for the native protein. All mutants were expressed and purified as described 
above. 

RNA preparations. Sequences of RNAs used in this study are listed in 
Supplementary Table 3. All RNA oligonucleotides were synthesized from 
Dharmacon or Integrated DNA Technologies and further purified with PAGE 
or HPLC. The concentrations of the RNAs were measured by ultraviolet spec- 
trometry at 260 nm; RNA duplexes used for crystallization and the cross-linking 
assay were first annealed in a buffer of 30 mM HEPES-K, pH7.5, 100 mM pot- 
assium acetate, 2mM magnesium acetate. RNA duplexes used for small RNA 
methyltransferase assay were annealed in a buffer of 50 mM Tris-HCl, pH 7.6 and 
100 mM KCI. Annealing was performed by heating the mixture for 5 min at 95 °C 
and slowly cooling it to 37 °C followed by incubation for 2 h at 37 °C and 1h at 
24°C in a thermal cycler. Annealing efficiency was examined by running the 
anneal products on a 15% polyacrylamide native gel. 

Crystallization and data collection. Both miR173/miR173* and miR173/ 
miR173*cm RNA duplexes were used to co-crystallize with HEN] in the presence 
of AdoHcy. Only miR173/miR173*cm gave out crystals with enough quality for 
data collection. The ternary complex used for crystallization was prepared by 
adding 20-fold excess of AdoHcy to HEN] protein and incubating on ice for 
0.5 h followed by the addition of twofold excess of RNA. The final concentration 
of HEN] in the complex is about 5 mg ml’. The initial screening was carried out 
with commercial crystallization kits using Phoenix crystallization robot (Art 
Robbins Instruments) and detected using Rock Imager 2 and Rock Maker 
automated imaging system (Formulatrix). The preliminary hit was obtained at 
condition number 36 of Wizard II Screen (Emerald Biosystems), 0.1M 
phosphate-citrate, pH 4.2, 10% PEG 3000 and 0.2M NaCl. The crystals were 
optimized using the hanging-drop vapour diffusion method at 20°C and 
Additive Screen (Hampton Research) was used during the optimization of initial 
condition. Addition of NaBr (Additive Screen, No. 29) markedly improved the 
quality of the crystals. Finally, the crystals were grown in the solution containing 
0.1M phosphate-citrate, pH 4.8, 15% PEG 3350 and 0.2M NaCl and 0.01M 
NaBr. SeMet-labelled crystals were obtained under the same condition as for 
the native crystals. Crystals were transferred into cryoprotectant solution with 
20% glycerol and then flash-frozen in liquid nitrogen. Diffraction tests of collected 
crystals were performed at 100 K using a Rigaku X-ray generator equipped with 
R-AXISIV+ + detectors. A multi-wavelength anomalous dispersion (MAD) data 
set to 3.4A was collected on a SeMet-labelled crystal at the Argonne National 
Laboratory beamline 23ID-B. A native data set to 3.1 A was collected at beamline 
19BM, the Structural Biology Center at the Applied Photon Source. The diffrac- 
tion data were processed and scaled with the HKL2000 package’. The data col- 
lection and processing statistics are summarized in Supplementary Table 1. 
Structure determination and refinement. Phase was determined by the mul- 
tiple-wavelength anomalous dispersion method”* using 3.4A MAD data by 


nature 


PHENIX package**. Out of a total of 26 sites, 18 selenium atoms were located 
using the program HYSS in PHENIX™. Heavy atom refinement and MAD phas- 
ing were carried out using programs SOLVE and RESOLVE in PHENIX™, and 
the figure of merit after phasing improvement by program RESOLVE was 
increased to 0.74 from initial 0.36. An initial model of HEN1 was manually built 
with the programs O*° and Coot” using the locations of SeMet positions as 
guides. The model of the small RNA duplex was built based on the position of 
the 5’-phosphate that only exits in the miR173 strand. The initial model of the 
complex was refined through alternating cycles using the program phenix.refine 
in PHENIX. Non-crystallographic symmetry was used to restrain the core of two 
domains in the asymmetric unit while more variation was allowed in the loop 
regions. The final model was refined to the native data in the resolution range 
20-3.1A using CNS version 1.2 (ref. 37) until the R/Rpce were 26.0/28.8 with 
good stereochemistry. Ramachandran analysis showed that 87.2% of residues are 
in most favoured regions, 12.8% of residues are in additional allowed regions, 
and no residues in the generally allowed or disallowed regions. The final model 
contains two double mutant L604P/K660R HEN1 molecules including residues 
1-6, 213-215, 290-301, 411-454, 501-534, 542-551, 572-599, 839-850, 912-916 
and 934—942 of chain A and residues 1-6, 292-303, 410-452, 501-519, 529-534, 
540-546, 574-598, 840-851, 910-917 and 934-942 of chain D (Supplementary 
Fig. 1), two miR173/miR173*cm duplexes, two AdoHcy, two Mg’*, and 37 
waters (Supplementary Table 1). Residue 535-541 of chain A and residue 
535-539 of chain D are not certain owing to the poor electron density in the 
middle of a long disordered region. The refined structure was validated using 
PROCHECK™. Structural figures were prepared with PyMol (http://www. 
pymol.org). 

Small RNA methyltransferase assay. The in vitro small RNA methyltransferese 
assay monitored by the incorporation of the ['*C]-methyl group was performed 
as previously described” with minor modifications. A 100-1 methyltransferase 
reaction was set up for the annealed small RNA substrate miR173/miR173* and 
HEN1 mutants. The reaction mixture contained 50mM Tris-HCl (pH 8.0), 
100mM KCl, 5mM MgCh, 0.1mM EDTA, 2mM DTT, 5% glycerol, 
2 ul RNasin (40 U ult; Promega), 0.5 puCi S-adenosyl-L-[methyl-'*C] methio- 
nine (58.0mCimmol~ 1. Amersham Pharmacia Biosciences), 5g purified 
protein, and 1 nmol RNA substrate. After incubation at 37 °C for 2h, the reac- 
tion was stopped by adding 100-1 2X proteinase K solution (100 mM Tris-HCl, 
pH8.0, 10mM EDTA, 150mM NaCl, 2% SDS, and 0.4 mg ml ! proteinase K) 
followed by incubation at 65 °C for 15 min. The reaction was then extracted with 
phenol/chloroform. To precipitate the small RNAs, 1 ml glycogen (5 mgml'), 
0.1 vol of 3 M NaOAc (pH 5.2), and 2.5 vol of ice-cold 100% ethanol were added 
to the reaction. The mixture was stored at —80 °C for 2 h and centrifuged at 4°C 
for 30 min. The pellet was washed with 100 ml 70% cold ethanol. The RNAs in 
the pellet were dissolved with 1X RNA loading buffer, heated at 95 °C for 5 min, 
immediately put on ice, and loaded on to a 15% denaturing polyacrylamide gel 
with 7 M urea. After electrophoresis, the gel was treated with an autoradiography 
enhancer (En*hance from Perkin Elmer) following the manufacturer’s instruc- 
tions and exposed to X-ray film at —80°C. To assay the Mg**-dependent 
methyltransferase activity, Mg** was omitted in all annealing buffer and reac- 
tions except as indicated. Different amounts of Mg”* and EDTA were added into 
individual reactions containing the HEN1 protein and were incubated at room 
temperature for 15 min before adding small RNA substrates and ['*C]-labelled 
AdoMet. 

The in vitro RNA-protein photochemical crosslinking assay. Small RNA 
miR173 with a 5-iodouracil (5IU) substitute at U1 or U21 (mutated from 
A21) (Supplementary Table 3) was 5’-end labelled using y-°?P-ATP (NEN), 
and annealed with miR173*cm or used as a single-stranded substrate. The 
photochemical crosslinking assay was performed as described’. Typically, 
20-l reactions containing 0.2 1M RNA and 2 1M HEN] or mutants were placed 
in a 1.5-ml microtube and incubated for 20 min on ice. Exposure to the ultra- 
violet light source (Spectroline, Amax = 312, 330 WW cm “) was at a distance of 
~2.5cm, filtered through a polystyrene Petri dish for 10 min. Crosslinking 
products added with 2 loading buffer were separated on 12% SDS-polyacry- 
lamide gels, which were exposed to Storage Phosphor Screen (GE Healthcare) 
and visualized using a Storm PhosphorImager (GE Healthcare). 
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CORRECTIONS & AMENDMENTS 


ERRATUM 


doi:10.1038/nature08492 

Dense packings of the Platonic and 
Archimedean solids 

S. Torquato & Y. Jiao 


Nature 460, 876-879 (2009) 


In Figure 1 of this letter, in the top row ‘Al’ was incorrectly listed as 


‘P6’. The correct figure is shown below. 


Pq P2 P3 P4 P5 At 
A2 A3 A4 A5 AG AT 
A8 AQ A10 Att A12 A13 
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CORRECTIONS & AMENDMENTS 


ERRATUM 

doi:10.1038/nature08493 

Stable single-unit-cell nanosheets of zeolite 
MFI as active and long-lived catalysts 


Minkee Choi, Kyungsu Na, Jeongnam Kim, Yasuhiro Sakamoto, 
Osamu Terasaki & Ryong Ryoo 


Nature 461, 246-249 (2009) 


In this Letter, the affiliations for Osamu Terasaki were incorrect. This 
author is associated with affiliations 4 and 5 from this Letter, the 
Graduate School of EEWS (WCU), KAIST, and Structural 
Chemistry, Arrhenius Laboratory, Stockholm University, and not 
with Nanoscience and Nanotechnology Research Center, Osaka 
Prefecture University. 
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CORRIGENDUM 


doi:10.1038/nature08523 

Genotypic sex determination enabled 
adaptive radiations of extinct marine reptiles 
Chris L. Organ, Daniel E. Janes, Andrew Meade & Mark Pagel 


Nature 461, 389-392 (2009) 


In this Letter, ‘Dolichorhynchops osborn? was incorrectly listed as 
“Dolichorhynchops osburn? at two occasions in the text. 
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Life ina monastic lab 


A vocational career. 


Joost Uitdehaag 


The bell rang for evensong as Jorge 
attached the power-pack and started his 
gel. He smiled. He liked it when every- 
thing was exactly in time. He left the lab 
and walked towards the chapel. On the 
way he met his older friend, Anselm, who 
hurried along as usual. 

“Slow down, Jorge whispered. “What's 
the use?” 

“What's the use of being slow?” 

“Slow is about taking aim.” 

“Where did you get that from?” 

“A penitence session.” 

“Don't mention those.” 

“You mean they are counterintuitive to a 
fearful old individualist. Really, you should 
join. Maybe even tonight?” 

Anselm just smiled. They stopped talk- 
ing as they entered the chapel. It had a 
pleasing retro ambience — its design influ- 
enced by Le Corbusier's famous Chapel of 
Notre Dame — amid the lab complex of 
the Benedictine Order for Oncology, set in 
a remote valley in the Ardennes. 

For Jorge, his lab was one of the good 
things the great crisis had brought: a total 
reshuffling of drug research, an injection 
of idealism in a world of self-interest. That 
the injection had come from religion was 
no surprise for Jorge. Management gurus 
had been courting religious rules long 
before the crisis. Live for yourself or for 
your community, that was the post-crisis 
choice, and science and religion were both 
community efforts. Scientific monasticism 
was a new synthesis, the ultimate way of 
serving society. 

All the scientists had gathered in the 
chapel, and they started a medieval hymn. 
Singing together was supposed to stimulate 
collaboration and equality, but Jorge was still 
bad at it. During the hymn he worried about 
Anselm. His friend had started to complain 
again about giving up the ‘self’ side of sci- 
ence. He was a former academic and had 
this all-pervading desire to compete and 
establish his name, but within the Order 
that would get him into trouble. They gave 
you a permanent contract and a budget so 
there was no need to worry about grants or 
tenure, but in return the Order demanded 
no double work, no egos and no secrecy. 

Ifonly Anselm had been a pharma man. 
Novices from industry generally had less 
trouble giving up the self-side. But then 
again, those who had worked through the 
Barren Years had generally less passion for 
their jobs than a zebrafish for a barcode. 


Jorge wondered why people could not 
simply decide if they really wanted to live 
their undergraduate dreams and work on 
curing disease, or if they wanted something 
else. Anselm always said he was naive. 

“Idealists have a history of getting hurt,” 
he would say. 

“Isrt that the whole point,’ Jorge would 
reply, “that contributing costs you?” 

“You just haven't suffered yet.” 

Anselm had been damaged by his time 
in academia; that much Jorge knew. That's 
why doing penitence tonight would be 
good for him. It would give him that per- 
fect feeling that all was well and that he 


was living a good life. If only Jorge could 
convince him. 

The singing finished and Abbott Fra Pao- 
lini spoke about the Barren Years. That was 
the time when ever larger pharma compa- 
nies and a society ever more hostile to them 
together had driven the cost of developing a 
drug to $2 billion. And what was considered 
worse: to the cost of a thousand scientific 
careers. It had been the scientific equivalent 
of the Somme offensive. 

With a wide movement of his hands, Fra 
Paolini spoke of the day when seven ex- 
pharma scientists had taken up vows in a 
monastery to continue a ‘killed’ project. It 
was a golden move. Their vows of poverty 
(no patenting, no bonuses), chastity (do 
nothing that satisfies only yourself) and 
obedience (listen to what patients want) 
were the right guarantees for patient organ- 
izations and health insurers to pour money 
into monastic research labs. In the past year, 
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these labs had developed and published the 
majority of new therapies (generics compa- 
nies usually took up marketing them). 

After the ceremony, Jorge waited for 
Anselm. 

“Why do you fear a penitence ses- 
sion? You know they did this all the time 
in the old days: remember Borel and 
cyclosporine? It’s part of our tradition. It 
is why the public likes us.” 

“I don’t fear it. I just don't think it’s 
rational. It’s hysterics.” 

“Tm not hysterical.” 

“But youre not joining tonight are you?” 

Jorge did not answer. Anselm stopped 
walking and gave him an angry look. 

“You are! That would be what, the second 
time in a month? You're wasting yourself” 

“The supervisory committee allowed 
me.’ 

“Sure they do. Bunch of vampires, they 


“Tt has nothing to do with them and all 
with me,’ Jorge hated being berated. 

“T wont allow you,’ Anselm said. 

“What do you want to do? Swap 
places?” 

“Tf that’s what it takes.” 

Jorge was amazed. Was getting Anselm 
to do penitence really this simple? Was he 
really going to give up his principle for a 
worry about a friend? Anselm never ceased 
to surprise him. 

“All right,’ he said. 


In the monastery, most clinical trials were 
carried out in the infirmatorium, on a 
veranda filled with the evening’s sunlight. 
Jorge was sitting at Anselm's bed. 

“You are getting chimidinib,’ Jorge said, 
“the first inhibitor of the Chung-Mi variant 
isomerase. Have you seen the preclinical 
data?” 

“Yes. They're ok? 

Jorge rolled up Anselm’s sleeve as a nurse 
prepared the drip. 

“You want some blood for western blot- 
ting tomorrow?” 

Anselm nodded. “Don't worry,’ he said as 
the compound started to enter his body. 

But Jorge felt guilty. That night he did 
the only sensible thing: he lit a candle for 
his friend. a 
Joost Uitdehaag lives in the Netherlands 
and works for a drug-discovery company. 
His writing includes literature on drug 
discovery and fantasy stories for Dutch 
magazines. 
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How did your interest in 
the molecular biology of 
taste evolve? 

In high school, I was 
intrigued by molecular- 
biology techniques and their 
potential in conducting 
neuroscience studies of 
brain function. In college 
and medical school, my 
professors questioned my 
interest in a field that didn't 
yet exist. But as a postdoc I 
saw colleagues use molecular 
techniques to study vision 
and smell. I realized taste 
was a complete ‘black box’: 
we had no idea how taste 
cells worked at the molecular 
level. So I saw it as an 
opportunity to identify and 
clone taste receptors. 


What was your first 

‘aha’ moment? 

In late 1991, we found a new 
protein in taste cells that was 
closely related to the protein 
transducin, which transmits 
visual signals to the brain. At 
first we took this new insight 
into the taste system to 
mean that if transducin-like 
proteins were in taste cells, 
taste might be closely related 
to other sensory systems 


such as vision. We found that 
although the taste version of 
transducin (ultimately called 
gustducin) was structurally 
similar to transducin, only 
the signalling outputs of the 
two were similar. 


What do you consider to 
be your greatest scientific 
achievement? 

We molecularly 
characterized gustducin’s 
involvement in sweet, 

bitter and umami (the 
monosodium glutamate 
taste). That is a stepping 
stone to further studies of 
taste signalling elements, 
which we have found are 
expressed elsewhere in the 
body and contribute to 
non-taste functions in the 
stomach and pancreas. At 
Monell, I plan to circle back 
to the role of these proteins 
in health and disease. 


Do you get bombarded 
with questions from the 
food industry? 

No, but I have been in 
some interesting forum 
discussions with molecular 
neuroscientists and chefs. 
It is interesting to compare 


Robert Margolskee, an expert in the molecular 
mechanisms of taste, has recently accepted a 

faculty position at the Monell Chemical Senses 

Center in Philadelphia, Pennsylvania. : ‘ 


notes, and at some point 
I’m sure we'll get to the 
level of understanding how 
taste works to apply it to 
the creation of a meal or 
dessert. 


What do you value most 
about the scientific 
process? 

There is a purity and a clarity 
in discovery and publication 
that is closely related to 
nature and truth. 


What is the key to 
navigating a successful 
scientific career? 

I wish I knew. I guess it is 

a matter of balance. You 
have to balance everything 
in your life — personal and 
professional, bench work 
and supervising others, what 
appeals to you and what will 
get funded. To be effective 
and successful, you must 
find a way to follow your 
heart and anticipate what 
the journal editors will say. 
But in the end, I think you 
can approach your career in 
different ways and end up 
coming to the same point. ™ 


Interview by Virginia Gewin 


POSTDOC JOURNAL 


Take one scientist. Blend 

in professional science 
communicators. Incubate in 
crisp mountain air. The result: 
an ability and a desire to 
discuss science with all sorts 
of audiences. 

Two months ago I'd never 
made a film, designed a 
website or written a science 
news piece. By the end of 
August I'd had a major part 
in all three. How? | was 
fortunate to participate in an 
intensive two-week science- 
communication programme 
at the Banff Centre, a crucible 
of literary and performance 
art nestled in the Rocky 


Mountains of Alberta, Canada. 
Participants receive hands- 
on training with a broad 
range of media. Our teachers 
included accomplished 
professionals in television, 
radio, web and print 
journalism. Their mentorship, 
combined with audio and film 
equipment and web-design 
support, taught me how to 
talk science successfully via 
multiple media. Our group, 
for example, made lively 
podcasts, which we called 
‘Bunk Debunk’, to define 
scientific jargon in clear terms. 
| attended the programme, 
in part, to improve on the 


Communicating science 


campus-radio science 

show that | host. Now | feel 
invigorated — able and eager 
to talk science effectively, 

not only through radio 

and podcasts but also via 
television and print media. 
My passion for public science 
communication nearly 
matches that for my research. 
Now |'m certain that | want 

to nurture a career in science 
communication, whether as 

a sideline to my research or, 
perhaps, as my full-time job. m 
Julia Boughner is a postdoc in 
evolutionary developmental 
biology at the University of 
Calgary, Canada. 
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IN BRIEF 


Endangered papers 


Conservation scientists take up to three 
times longer to publish their work than 
other biologists, according to anew 
study that warns this could affect time- 
critical environmental decisions. Ryan 
O’Donnell and two other PhD students at 
Utah State University in Logan examined 
more than 2,000 articles published in 14 
life-science journals in 2007 to calculate 
the delay between last data collection and 
submission. The median delay for papers 
on conservation was 696 days, compared 
with 189 days for evolution and 605 for 
taxonomy. The authors suggest the hold- 
up arises because many conservation 
biologists do governmental work and have 
other obligations besides publishing. 


Bridges to biotechnology 


Oregon’s engineers and other skilled 
workers who lost their jobs in the 
economic downturn have a new 
alternative, thanks to a recently established 
biotechnology retraining scheme. The 
Bioscience Foundations Program, which 

is jointly funded by a US$136,000 federal- 
stimulus grant and the state of Oregon, 
aims to match people to short internships 
at Portland-based bioscience companies 
— particularly medical-device firms — 
that could lead to permanent positions. 
Internship applicants will be interviewed 
by the firms themselves; those selected will 
be tutored by bioscience experts on process 
and compliance in the industry, as well as 
on issues of policy, environment, ethics, 
and research and development. 


Gender imbalance persists 


Men continue to make up the majority 

of doctoral scientists and engineers 

in the United States, according to the 

US National Science Foundation’s 

most recent 2006 Survey of Doctorate 
Recipients, released on 24 September. The 
report says that men comprise some 68% 
of America’s 561,230 doctoral scientists 
across all science fields and 90% of the 
nation’s 121,520 doctoral engineers. The 
last survey, conducted in 2003, found 

that men represented 70% of doctoral 
scientists and 91% of doctoral engineers. 
Of the 10,920 doctoral scientists and 
engineers across all fields who are neither 
working nor seeking work, three-quarters 
are female, the latest report says, up 

from the two-thirds reported in the 

2003 survey. 
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RISING'STAR? 


The Japanése city offNag 
manufacturing Success into 
applications. David €yranoski surv 


he car-maker named after its 

home city of Toyota has brought 

international fame and economic 

vitality to Japan's central coastal 
region, which has the city of Nagoya at its 
heart. The area’s industrial prowess has 
made it the country’s most productive 
manufacturing zone for the past three 
decades. But in terms of international 
standing in science and technology, Nagoya 
— Japan's fourth most-populous city — and 
its university remain under-appreciated and 
largely unknown. 

“Everyone knows Toyota, but people don’t 
know Nagoya or the university,’ laments 
Nagoya University’s vice-president Takashi 
Miyata. Founded in 1939 as one of Japan's 
seven prestigious imperial universities and 
with roots as a medical-training institution 
in the 1870s, Nagoya University has nowhere 
near the international visibility of national 
universities in Tokyo, Kyoto and Osaka. 

It stands at 120 in the 2008 Times Higher 


832 


‘Skywalk’ in 
downtown 
Nagoya. 


i 


is aiming tovturn a history of 


ground for science 
s its potential. 


Education Supplement rankings — the other 
three are all in the top 50. And yet four of 

the seven Japanese scientists to win a Nobel 
prize this century were graduates or staff of 
the university: chemists Ryoji Noyori and 
Osamu Shimomura, and theoretical physicists 
Toshihide Maskawa and Makoto Kobayashi. 

Now the university and city and regional 
governments are planning to transform 
Nagoya into a research hub. They will focus on 
the region’s strong point: monozukuri, which, 
literally translated, means ‘making things. But 
the term also implies integrated production 
and the ability to put things together ina 
creative fashion, the way that Toyota excels at 
putting together cars, says Miyata. 

Nagoya University crystallographer 
Yoshikazu Takeda describes monozukuri as 
“manufacturing with craftsmanship”. He says 
it requires practitioners to “feel by their finger 
a difference in thickness as small as one 
micrometre”. 

Scientists and science policy-makers are 
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now striving to modernize monozukuri, 
using academic strengths in physics and 
chemistry, as well as a state-of-the-art 
plasma-research centre and synchrotron 
facilities. If all goes according to plan, they 
will bring Nagoya's science to a new level of 
international visibility and economic fortune. 


Monozukuri makes good 

Efforts to develop applied science with 
industry applications are already poised to 
bear fruit. For example, Nagoya University’s 
Center for Embedded Computing Systems 
was established in 2006 and is supported 
through collaborative research projects 
with several companies, including Toyota. 
Ina project co-developed with nearby 
AutoNetworks Technologies, the centre has 
applied for eight patents for technology that 
would standardize the networks of a car’s 
80-plus electronic control units. Projects with 
Toyota include the development of a new 
multimedia system that would enable, for 
example, remote diagnosis of car problems. 

But the monozukuri extends far beyond 
cars. Central government set up one of its 
18 ‘knowledge clusters’ in the region in 
2003. Called the Nagoya Nanotechnology 
Manufacturing Cluster, it includes Nagoya 
University, Nagoya Institute of Technology, 
Toyohashi University of Technology and 
Meijo University, as well as 56 local companies 
in the quest “to make environmentally 
friendly, highly functional materials that 
lead the world”. It aims to bring together 
expertise, foster collaborations and seek out 
industrial applications for science projects, 
with a focus on plasma nanotechnology. The 
cluster already includes the Plasma Centre 
for Industrial Applications, which started 
operating last August to help introduce new 
plasma technologies, especially to small and 
medium-sized monozukuri firms. It was 
set up with matching funds from the Aichi 
prefecture, where Nagoya is located. 

Plasma, a partially ionized gas known 
as the fourth state of matter, has valuable 
properties. For example, its free radicals react 
with a substrate surface, enabling the etching 
of patterns for integrated circuits, even at 
room temperature. But finding the best match 
between a given surface, the desired pattern 
and depth and a plasma remains difficult. 

“Plasma scientists don’t really know what's 
happening,’ says Keigo Takeda, a researcher 
on the project from Nagoya University. 
Generally speaking, the process is mostly 
trial-and-error. “There are huge amounts of 
waste,’ says Takeda. 

Takeda and his colleagues, led by Masaru 
Hori, have invented a device that not only 
can tell the type and concentration of the 
radicals in the plasma, but can also guide a 
self-adjusting feedback mechanism to ensure 
stable quality during production. “We're 
aiming for a device that anyone can use, 
one that can really speed up research and 
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development,” says Takeda. Hori’s team is 
now testing new applications, including the 
processing of materials for parts used in cars 
or aeroplanes. The plasma can be used to add 
a film that hardens the material or makes it 
easier to paint. 

Nanotech cluster projects are already 
spinning off companies and collaborations. 
Hori’s device is being developed by a 
spin-off called NU Eco Engineering. 
Nagoya University’s Osamu Takai is also 
commercializing his biomimetic self- 
assembling monolayers — molecules 
that, under certain conditions, organize 
themselves into layers based on hydrophilic 
or hydrophobic properties. The super- 
hydrophobic films, created with plasma 
processing, are being manufactured by 
Nagoya-based spin-off n-Factory. The 
technology could be used in automobile 
parts, medical equipment, and DNA and 
protein chips, as well as in photoresistors in 
semiconductor manufacturing. The cluster 
has also given rise to Meijo Nano Carbon, 
which sells carbon nanotubes. 


Funding renewed 

In October 2008, the central government 
made this one of only nine clusters to be 
renewed for another five years, earning 

it ¥1 billion (US$11 million) a year from 
the central government, to be matched by 
¥500 million a year from city or prefectural 
governments. 

The governments decision was partly 
influenced by the training of industry- 
minded scientists and collaborations 
with foreign and domestic partners. For 
example, Nagoya University’s Plasma 
Nanotechnology Research Center has 
ties with ten foreign research institutions. 
Infrastructure development also spurred 
government backing. Largest among these 
projects is the Central Japan Synchrotron 
Radiation Research Facility, which has 
received ¥20 billion in funding from the 
Aichi prefectural government. Based east 


of Nagoya and expected to 
start operations in 2012, the 
synchrotron will swallow ¥8 
billion of the budget. It joins 
SPring-8 (the world’s most 
powerful light source) in 
Hyogo to the west and Photon 
Factory northeast in Tohoku, 
but unlike these, it will have 
commercial aims. “It will be 
built to the specifications, 

in terms of reliability and 
stability, needed for industrial 
research and development,’ 
says Yoshikazu Takeda. 


Keigo Takeda: cutting waste. 


Kato notes that many venture 
funds have been established, 
such as Aichi Venture 
House, and he is encouraged 
by a growing interest in 
industry among university 
researchers. 

Bioscience efforts also 
tap the local expertise 
in nanotechnology and 
engineering. Nagoya 
University’s Innovative 
Research Center for 
Preventive Medical 
Engineering was established 


The potential users are 
many — starting with 
Toyota. The company has 
used SPring-8 to help design 
exhaust gas catalysts, hybrid 
car batteries for its Prius, and 
fuel cells. “But the rigorous 
specifications will be a boon 
for basic researchers who are 
able to use the facility,” says 
Takeda. He says it will cover 
90% of the research topics 
possible at SPring-8 and 
will have some capabilities 
that SPring-8 does not 


Osamu Takai: commercial goals. 


in 2006 to bridge the fields 

of medicine and engineering 
with a budget of ¥1.7 billion 
for four years (half from the 
science ministry and half 
from local companies). A 
group led by the centre's 
chemist, Yoshinobu Baba, for 
example, developed a method 
for separating out cancer 
stem cells in a blood sample, 
and other groups in the 
centre found new markers to 
predict metabolic syndrome 
and developed techniques to 


have. For example, the new 
synchrotron will have capacity for long-term 
studies looking at 1,000 samples ina series to 
ensure reliability of industrial products and 
reproducibility in basic-science experiments. 
Regional policy-makers hope that the 
infrastructure investment will attract 
more venture capital and encourage more 
university spin-offs. Aichi had 74 university- 
driven business ventures in 2007, placing it 
in sixth place in Japan. Masaki Kato, of the 
Aichi New Industry Division's science and 
technology section, says they could do better. 
“There is less capital here than in Tokyo, for 
example, but because of all the industry there 
are big opportunities,” he says. Although the 
recent recession has caused a drop in capital, 
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diagnose milk allergies. 

Hoping to commercialize his technology, 
Baba will start working with Nagoya 
University Hospital, which is part ofa 
medical network with more than 20,000 beds. 
He is confident that commercialization will 
move swiftly. “There are many monozukuri 
companies that want to collaborate with my 
group,” he says. The centre is now applying 
for a ¥4.2-billion renewal of its grant for the 
2010-16 period. 

Nagoya still lacks a significant biotech or 
pharmaceutical presence, especially since 
Pfizer cancelled a planned Nagoya-based 
research arm in 2007 as part of worldwide 
cutbacks. There is, however, some hope: the 
unit's intellectual property was spun off asa 
biotechnology company named RaQualia 
Pharma, which has set its sights on an initial 
public offering in 2011. 

But to make good on its international 
aspirations, the city will need a new and 
diverse crowd of scientists. Baba and others 
have foreign postdoctoral students, but 
attaining more senior positions remains 
difficult for foreigners, largely because of 
language barriers, says Yoshikazu Takeda. 
Nagoya University has set up liaison offices 
in eight cities around the world to help with 
recruitment. But even if it continues to 
struggle to match the scientific reputation of 
Japan's better-known hubs in greater Tokyo 
and the Kyoto region, current efforts to 
enhance its capacity for monozukuri could 
soon make Nagoya a major player. a 
David Cyranoski is Nature's Japan 
correspondent. 
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Gal4 turnover and transcription activation 


Arising from: K. Nalley, S. A. Johnston & T. Kodadek Nature 442, 1054-1057 (2006) 


Growing evidence supports the notion that proteasome-mediated 
destruction of transcriptional activators can be intimately coupled 
to their function’’. Recently, Nalley et al.’ challenged this view by 
reporting that the prototypical yeast activator Gal4 does not dynami- 
cally associate with chromatin, but rather ‘locks in’ to stable pro- 
moter complexes that are resistant to competition. Here we present 
evidence that the assay used to reach this conclusion is unsuitable, 
and that promoter-bound, active Gal4 is indeed susceptible to com- 
petition in vivo. Our data challenge the key evidence that Nalley et al.’ 
used to reach their conclusion, and indicate that Gal4 functions 
in vivo within the context of dynamic promoter complexes. 

Studies by several groups, including ours'***, have reported an 
intimate connection between the activity of transcriptional activators 
such as Gal4 and Gcn4 and their destruction by ubiquitin-mediated 
proteolysis. This intimate connection is difficult to reconcile with the 
conclusion by Nalley et al.’ that proteolytic turnover of Gal4 is not 
coupled to its function. This conclusion is based on the result of chro- 
matin immunoprecipitation (ChIP) experiments showing that endo- 
genous, active, Gal4 cannot be competed from the GAL1/10 promoter 
by induction ofa protein with the same DNA-binding specificity (‘com- 
petitor’). In this case, the competitor contains the hormone-binding 
domain of the oestrogen receptor (ER), which allows its DNA-binding 
activity to be rapidly induced by the treatment of yeast with B-oestradiol. 
Yeast cultures expressing both the competitor and endogenous Gal4 are 
treated with B-oestradiol, and ChIP analysis is used to monitor the 
binding of the two proteins to the GAL1/10 promoter. 

We obtained reagents from the Kodadek laboratory and repeated 
their experiments. In the course of performing an additional control 
that was not included in their Nature paper, we observed that, in the 
absence of any competitor, B-oestradiol induced an up to fourfold 
increase in the levels of Gal4 that associated with the GAL1/10 pro- 
moter (Fig. 1a, blue line). The unexpected ability of B-oestradiol to 
induce binding of endogenous Gal4 makes the competition assay 
difficult to interpret, as the compound is simultaneously inducing 
both the competitor and the species being competed. 
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To explore this issue further, we repeated the experiment using a 
different ER ligand, 4-hydroxytamoxifen (4HT). In the absence of 
competitor, 4HT had little effect on the association of endogenous 
Gal4 with its cognate promoter (Fig. la, red line). Consistent with 
the different effects of these two ligands on basal association of Gal4 
with chromatin, the two compounds gave very different results in 
the presence of competitor (Fig. 1b). As Nalley et al. published’, 
the addition of B-oestradiol to yeast expressing the competitor pro- 
tein resulted in little if any reduction in the levels of endogenous Gal4 
at the GALI/10 promoter, creating the impression that most pro- 
moter-bound Gal4 resisted competition (Fig. 1b, blue line). In 
the presence of 4HT, however, the opposite result was obtained, 
and ~75% of endogenous Gal4 was competed from the chromatin 
within 15 min of the ligand addition (Fig. 1b, red line). Notably, 
the loss of Gal4—chromatin association was accompanied by loading 
of the competitor onto the GAL1/10 promoter (Fig. 1c), consistent 
with the notion that the 4HT-activated competitor can displace 
endogenous Gal4 from the promoter. Although the competitor 
protein associated with the GALI/10 promoter with apparently 
slower kinetics than endogenous Gal4 dissociated (compare red 
line in Fig. 1b with pink line in Fig. Ic), it is worth noting that 
endogenous Gal4 can bind cooperatively to several sites in vivo’. 
There are four Gal4-binding sites in the GAL1/10 promoter. Thus, 
a single competitor bound to one of the sites could have the effect 
of destabilizing multiple Gal4—promoter complexes, leading to effi- 
cient displacement of endogenous Gal4 at substoichiometric levels of 


competitor. 


On the basis of our observations, we propose that the recalcitrance 
of Gal4—promoter complexes originally reported by Nalley et al.’ is an 
artefact of using B-oestradiol to stimulate the competitor. Activating 
the competitor with 4HT (Fig. 1b), or normalizing the B-oestradiol 
signal to the important ‘no competitor’ control (Fig. 1d), shows that 
Gal4 can indeed be rapidly displaced from promoter DNA in vivo. 
Their conclusion that Gal4d—promoter complexes lock in and have 
long half-lives under activating conditions is thus unsustainable. 
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Figure 1| Activation of a Gal4 competitor with 
B-oestradiol versus 4HT. a, Wild-type yeast were 
induced with 2% galactose for 60 min and 
B-oestradiol or 4HT was added. At indicated 
times, the occupancy of endogenous Gal4 on the 
GAL1/10 promoter was determined by ChIP. 
ChIP signal is normalized to that at time zero. 
b, As in a, except that experiment was performed 
in yeast expressing the Myc-G4-ER-VP16 
competitor (supplied by T. Kodadek’). ¢, As in 
the 4HT experiment in b, except that ChIP was 
used to monitor association of the 
Myc—G4—ER-VP 16 competitor with the GAL1/10 
promoter. The corresponding non-competitor 
controls are also shown. To calculate the 
percentage binding in this case, ChIP signals were 
normalized to those from a Myc-G4—ER-VP16 
ChIP (60-min time point) performed in the 
absence of endogenous Gal4, which corresponds 
to the total amount of competitor that can bind in 
this assay. d, ChIP signals from f-oestradiol or 
4HT experiments in b normalized to the relevant 
‘no competitor’ control in a. Error bars are s.e.m. 
(n= 3). 
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METHODS 

Yeast (BY4741) with or without competitor (Myc-G4—ER-VP 16)* were grown 
in complete synthetic medium (CSM) (2% raffinose) and Gal4 was induced by 
transferring yeast to media containing 2% galactose for 1h. Yeast were then 
treated with 1M 17-B oestradiol (Sigma) or 100M 4-hydroxytamoxifen 
(Sigma) for the indicated times. ChIP was preformed? using either the Gal4- 
TA C-10 (anti-GAL4; Santa Cruz) or AB1 (anti-Myc; Calbiochem) antibodies. 
DNA enrichment was calculated as described* using ACT] as the reference locus. 
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Nalley et al. reply 


Replying to: G. A. Collins, J. R. Lipford, R. J. Deshaies & W. P. Tansey Nature 461, doi:10.1038/nature08406 (2009) 


Proteasome-mediated turnover of some'”, but clearly not all**, tran- 
scriptional activators is important for their activity. To facilitate the 
analysis of activator-promoter complex lifetime in vivo, a parameter 
relevant to this issue, we developed a competition chromatin immuno- 
precipitation (ChIP) assay in which binding of a native transactivator 
to its cognate promoters is challenged by a ligand-activated competi- 
tor protein with the same DNA-binding specificity. We applied this 
technique to the yeast Gal4 system® and concluded that under non- 
inducing conditions (raffinose media) Gal4d—promoter complexes 
exchange rapidly, but under inducing conditions (galactose media) 
the activator-promoter complexes are long-lived. Collins et al.° report 
that, surprisingly, the addition of oestradiol to yeast lacking Myc—-G4— 
ER-VP16 increased the amount of DNA co-immunoprecipitated with 
native Gal4. 

This is a control we had not done, but have subsequently repeated 
and agree that this is the case (S.A.J. wishes to note that he had 
requested this control and it erroneously was not done). We thank 
Collins et al.° for pointing out this omission. They go on to show that 
inducing competitor protein activity with 4-hydroxytamoxifen 
(4HT) results in a significant loss in the intensity of the ChIP signal 
owing to native Gal4, but that this ligand does not affect the intensity 
of these ChIP signals in the absence of competitor. They also show 
significant association of the competitor protein with the promoter, 
although with a different time course than Gal4 dissociation. We 
agree that these data indicate that a large fraction of Gal4d—promoter 
complexes are kinetically labile in vivo under these (4HT-containing 
media) conditions. It is important to note that this odd effect of 
steroid is not a general problem in the application of this technology 
to the measurement of other activator-promoter half-lives’. 

However, our ChIP data tracking association of the competitor 
protein do not support the conclusion of a rapidly exchanging 
Gal4—DNA complex in vivo in the presence of B-oestradiol rather 
than 4HT. There is no indication that these data are compromised 
by unanticipated effects of B-oestradiol. Under inducing conditions, 


much lower levels of association of the competitor protein with GAL 
promoters were observed when Gal4 was present than in Agalé4 cells 
when f-oestradiol was used to trigger the competition. These data 
argue for the presence ofa stable, functional Gal4d—promoter complex 
in the presence of galactose and under the particular conditions used 
in our study°. It may be that the stability of Gal4—promoter com- 
plexes are somehow affected by steroid receptor ligands, which would 
explain the different results observed by ourselves and Collins et al.° 
for the association of the competitor protein in our respective 
experiments. 
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